We are used to think that O(log N) is always better than O(N). No matter if you learnt Computer Science in school or in the field there was a moment in your life when you wrote an O(N) routine and was sniffed at by fellow engineers. I even once worked for a team where it was a written rule to avoid anything worse than O(log N). But we forget about the "big N" in the "big O". It is not always you are dealing with large data and when the data is small, the "big N" effect may be against you.
Two years ago I have been tasked on an interview to write a code to find the nearest match in a sorted array. I wrote an O(N) loop and failed the interview. Note that the interviewer did not set the restriction of the array being big. Let's test it on some small data.
Here is the JMH micro benchmark code. I have tested the same thing on arrays of ints, longs and doubles. As you can see, the sizes of array in the test are small - from 1 to 10 elements. And that's how the benchmark result looks like:
In that big mess you can see that binary search is not really winning much over the full array pass through. Quite often it is even slower. There are multiple reasons for that:
-
In a simple Java for loop the range check will be placed outside of the loop by JIT compiler. Binary search has to do the range check every time it gets an element from an array.
-
Sequential access is very CPU cache friendly, especially when the data is small and has a chance to fit in a few cache lines (my CPU has 64 bytes cache line, but note that I have not done anything to align the array in memory - yes, that is possible even in Java).
-
The simpler the function, the easier it is for Java compiler to understand and optimize it. It is not much of a case here, since both functions are small. But some better algorithms are quite complex and may derail the compiler.
There are other advantages the naive code has that I outlined in a previous blog post.
We live in the world of big data, but we still interact with small chunks of it. There are many cases when the size of your input will be limited. I am asking you to think about small data implications when you are dealing with it. And if, for whatever reason, you are not using library function to do X, just maybe, you should write naive and simple to understand function.