Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ SMAUG ➜ Running the server ➜ Case sensitive skill/spell lookups

Case sensitive skill/spell lookups

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1 2  3  

Posted by Samson   USA  (683 posts)  Bio
Date Sun 28 Jan 2007 01:27 AM (UTC)
Message
An interesting problem has developed at some point with the current SmaugFUSS. It seems that skill/spell lookups for pretty much any purpose have turned case sensitive and we can't find the reason for why.

The lookup problem appears to be at lest somewhat related to the various bsearch_skill_* functions. In trying to track down the cause, I find myself confused and wondering just what these things do.


int bsearch_skill_exact( const char *name, int first, int top )
{
   int sn;

   for( ;; )
   {
      sn = ( first + top ) >> 1;

      if( !IS_VALID_SN( sn ) )
         return -1;
      if( !str_cmp( name, skill_table[sn]->name ) )
         return sn;
      if( first >= top )
         return -1;
      if( strcmp( name, skill_table[sn]->name ) < 1 )
         top = sn - 1;
      else
         first = sn + 1;
   }
}


All that looks like for the most part is a cryptic mess that someone wrote to make it look like they were really smart or something :P

It sure seems to me that if all this is doing is looking for a name match against a certain skill name that one could just blast their way through the skill array looking for it rather than doing all this obfuscated mess.

Could someone who knows what this does please explain it in a way that most of us will understand? And perhaps suggest an alternative method to do this that's easier to figure out?
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #1 on Sun 28 Jan 2007 02:26 AM (UTC)
Message
Judging by the name "bsearch" it is doing a binary search. This is more efficient for large tables, as the search time is proportional to the log of the number of items in the table, rather than the actual number.

However on fast machines, the difference may be a bit academic (for 100 skills or so).

Basically it splits the table into 2 (say for 100 items, it looks at position 50). Then if the desired item is below 50 it looks between item 1 and 50, otherwise 51 to 100.

By repeating this operation, it can narrow down the comparisons by halving the search space on each iteration.

So, say the wanted item was at 99 (in a table of 100), you would quickly get there like this:


  • test item 50 (greater) (increment is 50)
  • test item 75 (greater) (increment is 25)
  • test item 87 (greater) (increment is 12)
  • test item 93 (greater) (increment is 6)
  • test item 96 (greater) (increment is 3)
  • test item 98 (greater) (increment is 2)
  • test item 99 (greater) (increment is 1)


This would be the worst-case scenario for a linear search, but it has found it in 7 steps.

In your case, possibly making the test not case-sensitive would solve the problem, but you must make sure they are loaded in correctly (that is, in not case-sensitive order).

Binary searches only work if the table is in sequence.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #2 on Sun 28 Jan 2007 02:44 AM (UTC)
Message

/*
 * Lookup a skill by name.
 */
int skill_lookup( const char *name )
{
   int sn;

   if( ( sn = bsearch_skill_exact( name, gsn_first_spell, gsn_first_skill - 1 ) ) == -1 )
      if( ( sn = bsearch_skill_exact( name, gsn_first_skill, gsn_first_combat - 1 ) ) == -1 )
         if( ( sn = bsearch_skill_exact( name, gsn_first_combat, gsn_first_tongue - 1 ) ) == -1 )
            if( ( sn = bsearch_skill_exact( name, gsn_first_tongue, gsn_first_ability - 1 ) ) == -1 )
               if( ( sn = bsearch_skill_exact( name, gsn_first_ability, gsn_first_lore - 1 ) ) == -1 )
                  if( ( sn = bsearch_skill_exact( name, gsn_first_lore, gsn_top_sn - 1 ) ) == -1 )
                     if( ( sn = bsearch_skill_prefix( name, gsn_first_spell, gsn_first_skill - 1 ) ) == -1 )
                        if( ( sn = bsearch_skill_prefix( name, gsn_first_skill, gsn_first_combat - 1 ) ) == -1 )
                           if( ( sn = bsearch_skill_prefix( name, gsn_first_combat, gsn_first_tongue - 1 ) ) == -1 )
                              if( ( sn = bsearch_skill_prefix( name, gsn_first_tongue, gsn_first_ability - 1 ) ) == -1 )
                                 if( ( sn = bsearch_skill_prefix( name, gsn_first_ability, gsn_first_lore - 1 ) ) == -1 )
                                    if( ( sn = bsearch_skill_prefix( name, gsn_first_lore, gsn_top_sn - 1 ) ) == -1
                                        && gsn_top_sn < top_sn )
                                    {
                                       for( sn = gsn_top_sn; sn < top_sn; ++sn )
                                       {
                                          if( !skill_table[sn] || !skill_table[sn]->name )
                                             return -1;
                                          if( LOWER( name[0] ) == LOWER( skill_table[sn]->name[0] )
                                              && !str_prefix( name, skill_table[sn]->name ) )
                                             return sn;
                                       }
                                       return -1;
                                    }
   return sn;
}


Ok. So given the bsearch_skill_exact function I posted before, and this skill_lookup function taken from Smaug, can you tell us what the purpose is? I'm assuming all it wants to do is find a skill in the table by its name, correct?

I'm not sure I grasp this properly, but it seems to me it would be faster to just run the table than perform 12 separate binary searches. Perhaps there is a better way this code can be written and still maintain the efficiency desired?
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #3 on Sun 28 Jan 2007 04:26 AM (UTC)
Message
Yes, well doing it 12 times seems to me to be throwing away the efficiency the binary search gives you.

Probably a simple linear search would be just as quick in this case.

Also, if you are using C++ by now, then you could make it into an STL map, which permits a direct lookup.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Conner   USA  (381 posts)  Bio
Date Reply #4 on Sun 28 Jan 2007 05:07 AM (UTC)
Message
Nah, this was about SmaugFUSS, it's still C, so can't use the STL stuff.

-=Conner=-
--
Come test your mettle in the Land of Legends at telnet://tcdbbs.zapto.org:4000
or, for a little family oriented medieval fun, come join us at The Castle's Dungeon BBS at telnet://tcdbbs.zapto.org
or, if you just want information about either, check our web page at http://tcdbbs.zapto.org
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #5 on Sun 28 Jan 2007 06:06 AM (UTC)

Amended on Sun 28 Jan 2007 08:08 AM (UTC) by David Haley

Message
(Edit note: fixed various small things.)

I believe the intention of nesting the binary searches like that is to search first in the most likely candidate skill ranges. Unfortunately, that intention is misguided when you are doing binary search, and falls right into Knuth's saying: "premature optimization is the root of all evils".

Binary search is an O(log_2 n) operation. What that means is that the time it will take to search the array is proportional to the 2-logarithm of the size of the array, modulo some constant factors.
Quote:
Aside:
Big-O notation is a way of measuring algorithm complexity by saying what function the time will be proportional. O(n) means that the function duration increases linearly as the input increases; O(1) means that the algorithm always takes the same time no matter how big the input is (e.g., deleting the first element of a linked list). And so forth. It's a very powerful way of thinking about how an algorithm will worsen with respect to time as your input size grows.

Let's look at some case studies:

Array of size 16: binary search will need ~log_2(16) = ~4 lookups to find the element.

Array of size 64: ~log_2(64) = ~6 lookups.

Array of size 128: ~7 lookups.

Array of size 256: ~8 lookups.

Array of size 1024: ~10 lookups.

Array of size 65536: ~16 lookups.

As you can see, the time it takes to look something up increases very slowly even as the input array is multiplied by factors of 2.

(You can also see that for Nick's example of size 100, we needed 7 lookups which is consistent with the numbers above.)

Therefore, dividing the binary search into several segments shows a lack of understanding of how binary search behaves. There's really no need to try to search the most likely segment before others, unless (a) the segments are huge and (b) the probabilities of being in the various segments are _very_ disparate.
Quote:
Aside:
Consider a case where you have an array that can be divided into two sub-categories, and you know that a search will find a result from either one or the other. Further imagine that you know (somehow) that a request has a 99% chance of being in one segment and 1% chance of being in the other. Further assume that searching the arrays will take you a fair amount of time, perhaps because the arrays are very, very large (several million elements). And finally, perhaps the binary search itself is expensive, involving complex comparisons at every step of the way.
In a case like this, it might be worth your time to divide the search, trying first in the more likely segment.
But even so, you shouldn't go to this trouble until you know concretely that you are paying for this. You shouldn't go out of your way to optimize something without actually timing it first. It might turn out that improving your "really expensive search" will in fact only gain you a few milliseconds per second. (That is, you only gain a few nanoseconds per individual call to the function, and results only become "significant" when you sample over a large duration of time, like a second.)

The algorithm is an iterative version of binary search. It's a little less intuitive than the recursive version (to me, at least) but it works. The original author was probably trying to avoid the "overhead" (gasp) of a function call. If the recursion depth were to be to several hundred deep, it would start mattering (sort of... to be frank, even then, not really, because in the bigger picture this function is called relatively rarely); but in this situation the difference is basically noise.

This line:
sn = ( first + top ) >> 1;
is a "clever" way of dividing by two. Shifting right by 1 bit is the same thing as an integer divide by 2. For example:
101 binary is 5 decimal.
Shift right: 10; that's 2 decimal.

(Conversely, shifting left is the same as multiplying by 2.)

Shifting is indeed more efficient than actual division. Even a modestly intelligent compiler, however, knows that dividing by 2 is the same thing as a shift, and will optimize that all on its own if you have optimization turned on.

The point is that it's not useful or time-efficient to killer-optimize something that won't be called very often. Something called even 50 times a second that is basically fast doesn't need to have every last assembly instruction squeezed out of it. Now, if you had a loop that ran, say, 100 times a second, with lots of sub-loops -- as is commonly seen in 3d games -- it would matter. But in this case, it really falls into Knuth's expression.


Now, on to your problem.

This line:
if( strcmp( name, skill_table[sn]->name ) < 1 )
is using an exact string comparison.
I don't know if this is exactly the source of your problem, but what that means is that it chooses the next sub-area of the array to search based on a case-sensitive comparison. That can spell trouble because A < b but b > a. I would suggest using str_cmp.

I would also suggest caching the value of str_cmp by storing it into an integer after the first calculation. That way, if you find that you need it again, you don't need to scan the string again; you already have the comparison result.

But be careful about Nick's point; if you use case-insensitive searching you *must* make sure that the values are loaded (or later sorted) into case-insensitive order.

Also note:
Some of their checks are exact, some of them are prefix-based. It looks like it does everything exactly first, and then does everything by prefix. The best fix would be to have two binary searches; one exact over the whole range and one prefix-based over the whole range.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Conner   USA  (381 posts)  Bio
Date Reply #6 on Sun 28 Jan 2007 07:45 AM (UTC)
Message
Wow, Ksilyan, that was really indepth while remaining surprisingly lucid. Well done. In this case, we opted to utilize strcasecmp and went with the solution posted at http://www.fussproject.org/index.php?a=topic&t=1093&p=3997#p3997 but I loved your write up here, I think I may have learned several things from it. :)

-=Conner=-
--
Come test your mettle in the Land of Legends at telnet://tcdbbs.zapto.org:4000
or, for a little family oriented medieval fun, come join us at The Castle's Dungeon BBS at telnet://tcdbbs.zapto.org
or, if you just want information about either, check our web page at http://tcdbbs.zapto.org
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #7 on Sun 28 Jan 2007 08:01 AM (UTC)
Message
Thanks. :-) I am about to edit my post and fix a few small things, and clarify some others.

I have a quick question about the post you referred to: what does strcasecmp do? Is it a case-insensitive version of strcmp? Isn't that the same thing as str_cmp?


(str_cmp is a good example of how to not name a function. Who is to guess that the difference between strcmp and str_cmp is that the former is case-sensitive but the second isn't? Grr...)

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #8 on Sun 28 Jan 2007 08:30 AM (UTC)
Message
Yes, strcasecmp is the case insensitive version of strcmp. I'm not sure why str_cmp wasn't chosen, perhaps Keberus didn't realize it was case insensitive either. But his fix as posted does work.

Of course, I'm one to tinker, and I decided to try and see if your idea to only do 2 binary searches would work. One for exact, one for prefix, both over the entire range. The results were surprisingly a dismal failure. Here's how I converted skill_lookup:


/*
 * Lookup a skill by name.
 */
int skill_lookup( const char *name )
{
   int sn;

   if( ( sn = bsearch_skill_exact( name, gsn_first_spell, gsn_top_sn - 1 ) ) == -1 )
      if( ( sn = bsearch_skill_prefix( name, gsn_first_spell, gsn_top_sn - 1 ) ) == -1 && gsn_top_sn < top_sn )
      {
         for( sn = gsn_top_sn; sn < top_sn; ++sn )
         {
            if( !skill_table[sn] || !skill_table[sn]->name )
               return -1;
            if( LOWER( name[0] ) == LOWER( skill_table[sn]->name[0] )
             && !str_prefix( name, skill_table[sn]->name ) )
               return sn;
         }
         return -1;
     }
   return sn;
}


I'm puzzled by the result, since this looks valid. But upon bootup, none of the skills from the class or race files would match up. The log filled with spam on every last lookup. It then barfed up on all of the ASSIGN_GSN calls. So it does in fact appear as though doing all of the individual lookups was necessary. But I'd sure love to know why that is.
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #9 on Sun 28 Jan 2007 09:11 AM (UTC)
Message
Ah-ha!!

For some reason, str_cmp only returns true/false. It doesn't actually return -1, 0 or 1!

I ran the program through gdb and that's how I discovered this. The binary search was going into the second half of the array, when looking for acetum primus -- it was going forward when it should have been going backwards!



Are you using strcasecmp, or str_cmp?

It would appear that strcasecmp is a C library function that behaves like strcmp, in the sense that it returns -1/0/1.


Now, when I use strcasecmp, I still get some weirdness. It doesn't work.

I will investigate this further tomorrow. Something very fishy is going on; I suspect it has to do with how the array is formed.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #10 on Sun 28 Jan 2007 05:47 PM (UTC)
Message
Heh. See, we always find some of the strangest little mysteries :)

I'm aware that str_cmp only gives a true/false response, but it just hit me as to why that was significant since strcmp and strcasecmp will also give a negative number as a response.

But our fix is using strcasecmp, so I don't see why that's causing a problem with the modified skill_lookup I tried. Obviously it has something to do with how the skill array is constructed but it makes pretty much no sense that you can't just do the binary search on the entire range instead of in 12 little chunks like I have.

It would be nice to get FUSS trimmed down to only use what it needs. Not only is it more efficient, but it's easier to read and maintain when adding new types of skills if you don't have to fiddle with so much code to accomplish it.

Personally I would much rather use a std::map for my own code, but I haven't wrapped my head around how exactly to do that with the skill table and still prevent the stupidity of checking if a value has a valid entry without it creating a blank default. I'm probably over thinking it.
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #11 on Sun 28 Jan 2007 11:35 PM (UTC)
Message
Quote:
but I haven't wrapped my head around how exactly to do that with the skill table and still prevent the stupidity of checking if a value has a valid entry without it creating a blank default

What's wrong with the way I showed you using the find method?

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #12 on Sun 28 Jan 2007 11:58 PM (UTC)

Amended on Sun 28 Jan 2007 11:59 PM (UTC) by Nick Gammon

Message
Quote:

I haven't wrapped my head around how exactly to do that with the skill table and still prevent the stupidity of checking if a value has a valid entry without it creating a blank default.


It isn't really stupid.

The header for the map file indicates that the operator[] returns a reference to the item. There is no concept in C++ for a reference to a non-existant object. (You can have a pointer to something that doesn't exist, that is NULL).

Thus, the only way STL can return a reference to a map object using operator[] is for it to create one, if necessary, if it doesn't exist.

In the book "The C++ Standard Library" by Nicolai Josuttis (an excellent book), he comments in the chapter on maps that the operation:



m[key] - Returns a reference to the value of the element with key key.

Inserts an element with key if it does not yet exist.


That is pretty explicit, and as David says, you can use 'find' to see if the element exists, if that is what you want to know.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #13 on Mon 29 Jan 2007 12:07 AM (UTC)

Amended on Mon 29 Jan 2007 12:10 AM (UTC) by Nick Gammon

Message
This small test program shows how you might make a map of skills, and see if one exists:


#include <map>
#include <string>
#include <iostream>

using namespace std;

int main ()
  {
  map<string, string> skills;

  // add a couple of test cases

  skills ["fireball"] = "test1";
  skills ["iceball"] = "test2";

  // iterator for looking up stuff

  map<string, string>::const_iterator it;

  // do lookup 

  it = skills.find ("fireball");

  if (it == skills.end ())
    cout << "not found" << endl;
  else
    {
    cout << "found" << endl;
    cout << "value is " << it->second << endl;
    }

  }


Now if you change the lookup (the find line) to something that isn't there, it echoes "not found".

If it is found, then we can use the iterator "second" value (the value associated with the key) to see what the value is (in this case, "test1").

Maps are very powerful like that, and have a fast lookup. The map doesn't have to be a string-to-string map, the types are specified in the template. For example, you could have a string key (the spell name) but the item being stored could be a structure, or a pointer.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #14 on Mon 29 Jan 2007 12:14 AM (UTC)

Amended on Mon 29 Jan 2007 12:16 AM (UTC) by Nick Gammon

Message
Quote:

(You can also see that for Nick's example of size 100, we needed 7 lookups which is consistent with the numbers above.)


That is gratifying, to see it work. :)

You can check this with an ordinary calculator. Key in:

100 log / 2 log

You will get 6.644.

Round that up to 7, and that agrees with the fact that I found the answer in 7 steps.

Similarly for a table of 1000:

1000 log / 2 log = 9.966

This means that only 10 iterations are needed to find an item out of 1000.

Interestingly, it doesn't matter what base logarithms you use, so using the "ln" key (natural logs) or "log" key (logs to the base 10) give the same answer. As you are dividing one by the other, the base effectively cancels out.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


89,337 views.

This is page 1, subject is 3 pages long: 1 2  3  [Next page]

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.