(Edit note: fixed various small things.)
I believe the intention of nesting the binary searches like that is to search first in the most likely candidate skill ranges. Unfortunately, that intention is misguided when you are doing binary search, and falls right into Knuth's saying: "premature optimization is the root of all evils".
Binary search is an O(log_2 n) operation. What that means is that the time it will take to search the array is proportional to the 2-logarithm of the size of the array, modulo some constant factors.
Quote: Aside:
Big-O notation is a way of measuring algorithm complexity by saying what function the time will be proportional. O(n) means that the function duration increases linearly as the input increases; O(1) means that the algorithm always takes the same time no matter how big the input is (e.g., deleting the first element of a linked list). And so forth. It's a very powerful way of thinking about how an algorithm will worsen with respect to time as your input size grows.
Let's look at some case studies:
Array of size 16: binary search will need ~log_2(16) = ~4 lookups to find the element.
Array of size 64: ~log_2(64) = ~6 lookups.
Array of size 128: ~7 lookups.
Array of size 256: ~8 lookups.
Array of size 1024: ~10 lookups.
Array of size 65536: ~16 lookups.
As you can see, the time it takes to look something up increases very slowly even as the input array is multiplied by factors of 2.
(You can also see that for Nick's example of size 100, we needed 7 lookups which is consistent with the numbers above.)
Therefore, dividing the binary search into several segments shows a lack of understanding of how binary search behaves. There's really no need to try to search the most likely segment before others, unless (a) the segments are huge and (b) the probabilities of being in the various segments are _very_ disparate.
Quote: Aside:
Consider a case where you have an array that can be divided into two sub-categories, and you know that a search will find a result from either one or the other. Further imagine that you know (somehow) that a request has a 99% chance of being in one segment and 1% chance of being in the other. Further assume that searching the arrays will take you a fair amount of time, perhaps because the arrays are very, very large (several million elements). And finally, perhaps the binary search itself is expensive, involving complex comparisons at every step of the way.
In a case like this, it might be worth your time to divide the search, trying first in the more likely segment.
But even so, you shouldn't go to this trouble until you know concretely that you are paying for this. You shouldn't go out of your way to optimize something without actually timing it first. It might turn out that improving your "really expensive search" will in fact only gain you a few milliseconds per second. (That is, you only gain a few nanoseconds per individual call to the function, and results only become "significant" when you sample over a large duration of time, like a second.)
The algorithm is an iterative version of binary search. It's a little less intuitive than the recursive version (to me, at least) but it works. The original author was probably trying to avoid the "overhead" (gasp) of a function call. If the recursion depth were to be to several hundred deep, it would start mattering (sort of... to be frank, even then, not really, because in the bigger picture this function is called relatively rarely); but in this situation the difference is basically noise.
This line:
sn = ( first + top ) >> 1;
is a "clever" way of dividing by two. Shifting right by 1 bit is the same thing as an integer divide by 2. For example:
101 binary is 5 decimal.
Shift right: 10; that's 2 decimal.
(Conversely, shifting left is the same as multiplying by 2.)
Shifting is indeed more efficient than actual division. Even a modestly intelligent compiler, however, knows that dividing by 2 is the same thing as a shift, and will optimize that all on its own if you have optimization turned on.
The point is that it's not useful or time-efficient to killer-optimize something that won't be called very often. Something called even 50 times a second that is basically fast doesn't need to have every last assembly instruction squeezed out of it. Now, if you had a loop that ran, say, 100 times a second, with lots of sub-loops -- as is commonly seen in 3d games -- it would matter. But in this case, it really falls into Knuth's expression.
Now, on to your problem.
This line:
if( strcmp( name, skill_table[sn]->name ) < 1 )
is using an exact string comparison.
I don't know if this is exactly the source of your problem, but what that means is that it chooses the next sub-area of the array to search based on a case-sensitive comparison. That can spell trouble because A < b but b > a. I would suggest using str_cmp.
I would also suggest caching the value of str_cmp by storing it into an integer after the first calculation. That way, if you find that you need it again, you don't need to scan the string again; you already have the comparison result.
But be careful about Nick's point; if you use case-insensitive searching you *must* make sure that the values are loaded (or later sorted) into case-insensitive order.
Also note:
Some of their checks are exact, some of them are prefix-based. It looks like it does everything exactly first, and then does everything by prefix. The best fix would be to have two binary searches; one exact over the whole range and one prefix-based over the whole range. |