Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ SMAUG ➜ Running the server ➜ game_loop(), select(), and dual core CPUs

game_loop(), select(), and dual core CPUs

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1  2 3  4  5  

Posted by Zeno   USA  (2,871 posts)  Bio
Date Reply #15 on Sun 11 Feb 2007 07:39 PM (UTC)

Amended on Sun 11 Feb 2007 07:40 PM (UTC) by Zeno

Message
Strange, here are my results from testing the pulse speed.

Normal (32bit) server:
Quote:
Log: [*****] BUG: Sun Feb 11 16:24:57 2007: pulse update (time)

Log: [*****] BUG: Sun Feb 11 16:26:08 2007: pulse update (time)

Log: [*****] BUG: Sun Feb 11 16:27:23 2007: pulse update (time)

Log: [*****] BUG: Sun Feb 11 16:28:34 2007: pulse update (time)

Log: [*****] BUG: Sun Feb 11 16:29:55 2007: pulse update (time)


Samson's (64bit) server:
Quote:
Log: [*****] BUG: Sun Feb 11 12:33:42 2007: pulse update (time)

Log: [*****] BUG: Sun Feb 11 12:34:42 2007: pulse update (time)

Log: [*****] BUG: Sun Feb 11 12:35:54 2007: pulse update (time)

Log: [*****] BUG: Sun Feb 11 12:37:01 2007: pulse update (time)

Log: [*****] BUG: Sun Feb 11 12:38:04 2007: pulse update (time)


There should be 1min between each of those. (These were taken from the exact same times, so the first log on normal server is the same time on Samson's, just different timezones/clocks a little off)

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #16 on Sun 11 Feb 2007 09:20 PM (UTC)
Message
I stick by my earlier explanation. The select function is intended to wait for asynchronous IO to complete, with a timeout, so that you can do things if the IO does not complete in a reasonable time (eg. your fighting loops).

However if you supply an empty list of IO ports to test, it is possible that the operating system decides to optimize away the wait, on the grounds that the IO will never complete (as there is none to test for), and thus it may as well return immediately.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #17 on Sun 11 Feb 2007 09:42 PM (UTC)
Message
Ok, so. Not to be a prick or anything, but how would you explain it then that this select() strategy that's been in use in every MERC derivative since 1993 has pretty much stuck to the 1/4 second pulse value it gets? The code I showed you is all stock Smaug, but I've checked all the Merc derives I have and they all have it too. And nobody has ever complained about this until it happened on my server.

But this is what's really going to bake everyone's noodle:

The server popped a kernel panic, had to be rebooted, and afterward the problem seems to have gone away. The mud has returned to the usual timeout period. Load averages on the box have returned to normal. So if we were going to find a reason for this, that time has probably passed. :P

I'm verifying now, just in case I'm on crack.
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #18 on Sun 11 Feb 2007 09:54 PM (UTC)
Message
The kernel realized it was wrong and panicked? Oh well. :)

Quote:

... how would you explain it then that this select() strategy that's been in use in every MERC derivative since 1993 has pretty much stuck to the 1/4 second pulse value it gets ...


If the operating system writers decided to optimize select, so that it does not wait, if there are no ports to test, then that would explain it. I presume you have changed operating system versions, in the upgrade to the new hardware?

Anyway, this looks strange, this is from smaug17fuss.tgz:


void accept_new( int ctrl )
{
   static struct timeval null_time;
   DESCRIPTOR_DATA *d;

/// ... stuff here

   if( select( maxdesc + 1, &in_set, &out_set, &exc_set, &null_time ) < 0 )
   {
      perror( "accept_new: select: poll" );
      exit( 1 );
   }


Unless I am missing something, null_time is never initialized. That looks wrong straight away.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #19 on Sun 11 Feb 2007 10:18 PM (UTC)
Message
Nick's explanation makes much more sense to me. It is quite possible that the implementation for select changed slightly, given that it still respects the man page. Still, the manpage also says that select is a good way to do a portable sub-second precision sleep, so it would be odd.

Of course, if your kernel rebooting fixed the problem, it's possible that something was screwed up to begin with.

From a processor architecture standpoint, I can assure you that it makes absolutely no sense whatsoever that passing from 32 to 64 bits, and from single-core to dual-core, would double the passing of time. Seriously.

And yes, I am running 64 bit Ubuntu Edgy Eft.


I rewrote my networking code some time ago to better handle timing and stuff like that. I don't know if it's portable to SMAUGfuss, but I'll see what I can do. (Gee, I already have about 5 other patches pending... :P)

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #20 on Sun 11 Feb 2007 10:47 PM (UTC)
Message
void accept_new( int ctrl )
{
   static struct timeval null_time;
   DESCRIPTOR_DATA *d;

/// ... stuff here

   if( select( maxdesc + 1, &in_set, &out_set, &exc_set, &null_time ) < 0 )
   {
      perror( "accept_new: select: poll" );
      exit( 1 );
   }


Again, this code here has been around forever. We haven't done anything to any of the networking core. If it's busted, it's been that way for 10 years now. Certainly wouldn't surprise me either. It was my understanding though that if you declare something static, it gets filled with zeros. So perhaps when it sees &null_time, it got set to NULL?

And I'm a bit baffled at what the system was doing, but yes, after it rebooted, it seems to have righted itself and the game_loop has gone back to being its normal self again.
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #21 on Mon 12 Feb 2007 03:37 AM (UTC)
Message
http://bugzilla.kernel.org/show_bug.cgi?id=5105

Something Tyche posted on TMC led me to this bug report which does seem to describe the problem, though the bug itself is closed now. So it's possible this may not be the code's fault, but rather a fault in the kernel itself.
Top

Posted by Jon Lambert   USA  (26 posts)  Bio
Date Reply #22 on Mon 12 Feb 2007 04:26 AM (UTC)
Message
> Unless I am missing something, null_time is never initialized.

It's static storage so it's bit zeroed out.

On Windows you have to fool select() with a dummy FD_SET to get it to act as a timer, but the Posix select() is well documented as a timer.

One thing to note is that the timeout parameter MAY be modified during a select call! Posix doesn't guarantee it, so one should always reinitialize the timeout parameter between calls.
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #23 on Mon 12 Feb 2007 05:01 AM (UTC)
Message
OK, I see that the uninitialized static variable will be zero, although I think it is poor documentation to make it implicit.


Quote:

One thing to note is that the timeout parameter MAY be modified during a select call ...


This is correct, and was going to be my next point. According to the man page for select:


The select function may update the timeout parameter to indicate how much time was left.


Thus the uninitialized variable null_time may be zero the first time, but not necessarily subsequent times. I think it would be neater to make it explicit, each time through the loop.

If the problem has gone away, well and good, maybe the kernel had some obscure timing issue internally.

However I still think that having two selects in each main loop is redundant and looks bad. After all, you are basically waiting for IO to complete, and want to do stuff anyway each game tick. A single select statement, with the timeout adjusted so that it completes when a game tick would be up, should do the trick.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #24 on Mon 12 Feb 2007 05:12 AM (UTC)

Amended on Mon 12 Feb 2007 05:13 AM (UTC) by David Haley

Message
Nick, that will not quite work, because select will exit as soon as it has data. If you do your tick after select, you will be doing it potentially far too early.

(EDIT: after rereading what you wrote, it looks like you might actually have said what I just said... oops. :P)

However having two selects isn't useful; in addition to being redundant, it has a perhaps undesirable consequence:
After any socket is ready for, say, input (so the MUD), the MUD will process that, and then sleep until the next tick.
What this means is that, during the entire remaining time, absolutely nothing will happen. The remaining time is somewhere close to a quarter second, which, while not that much, is still an awful lot of wasted time.

What I feel is the better way of doing this is the following:


main loop {
  time = get current time
  next tick time = time + tick delay

  while (time < next tick time) {
    select(with timeout of next time tick - time);
    process network data
    update time to get current time
  }

  run tick update
}


This way, you process as much data as there is to be processed, waiting until the next tick if nothing shows up, but not wasting time sleeping, doing nothing at all. Of course, if nothing is there to be done, the process will still just sit there, waiting for something to happen (or for the timeout to elapse).

Incidentally, for those who were following the discussion on client time, this will get rid of that silly lag due to only sending 512b at a time (per tick!), because the MUD will keep sending as much as it can.
Perhaps now it is clear why there appears to be a delay: it sends out 512b, then waits an entire tick (~250ms) before sending out another 512b. So to get 2k, you need to wait a whole second, just for that data to be sent from the server.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Jon Lambert   USA  (26 posts)  Bio
Date Reply #25 on Mon 12 Feb 2007 05:30 AM (UTC)

Amended on Mon 12 Feb 2007 05:37 AM (UTC) by Jon Lambert

Message
> However I still think that having two selects in each main
> loop is redundant and looks bad. After all, you are
> basically waiting for IO to complete, and want to do stuff
> anyway each game tick. A single select statement, with the
> timeout adjusted so that it completes when a game tick would
> be up, should do the trick.

Firstly, the reason select() was used as a timer is that a reliable sub-second timer call wasn't around in the early days of unix. I believe the timing desired in the original mud was 4 pulses per second. One could probably substitute in something like usleep() today and it might have wider support.

Second the above won't work because the timeout parameter is the maximum time to wait. select() can return earlier if there is i/o. That's why one has to use null or empty FDSETS is one wants to use select() as a timer, because it guarantees select won't return early.

Pulses/Ticks are handled differently in other muds. In TinyMud and LPMud the alarm signal handler is used as a reliable timer. In my Windows ports of those muds I use a timer thread that uses sleep().
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #26 on Mon 12 Feb 2007 05:32 AM (UTC)
Message
Quote:

What this means is that, during the entire remaining time, absolutely nothing will happen. The remaining time is somewhere close to a quarter second, which, while not that much, is still an awful lot of wasted time.


Exactly. I agree with David here 100%.

Having two select statements is a fundamental design flaw.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #27 on Mon 12 Feb 2007 05:36 AM (UTC)
Message
Quote:
Second the above won't work because the timeout parameter is the maximum time to wait. select() can return earlier if there is i/o.

Jon, that's precisely the point of the pseudo-code I proposed: it keeps looping, selecting, getting input, and will only exit that loop when the tick time has elapsed.

The whole point is that we really do not want to just sleep. We might as well be servicing our sockets until the next game tick comes around; there's no point sitting around just idling.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #28 on Mon 12 Feb 2007 05:41 AM (UTC)
Message
I think a reasonable approach would be to have a single select, and have it time out after about 1/2 of a pulse, in this case about 1/8 of a second.

Each time through the loop you check to see if the pulse time is up (that is, if the next 1/4 second has elapsed). If so, you do your pulsing stuff. If not, do nothing. Now the worst case is that you go through the loop just before the pulse was due to arrive (say 1/100 of a second before), and you only wait 1/8 of a second before going through the loop again.

Now to stop "pulse creep" you make the pulse time a fixed amount from the previous (theoretical) time, not from when it actually fired. That is, the pulses should occur every 0.25 seconds from server startup. If a particular pulse processing takes longer than another this won't matter, as the next pulse time is a fixed amount -- not 1/4 of a second from the previous one. To put it another way, you basically want your pulses to occur when the time of day (in milliseconds) divided by 250 is modulo zero.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #29 on Mon 12 Feb 2007 05:46 AM (UTC)
Message
That's a good idea (both parts), and it's actually what my code does anyway (the part about tick creep, not the 1/8 second part). My pseudo-code should actually be:

next tick time = last tick time + tick delay

instead of

next tick time = time + tick delay

This way, if somehow your tick takes a fair amount of time to process, the next one happens on schedule, even if that means that there isn't as much time to process network data.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


154,334 views.

This is page 2, subject is 5 pages long:  [Previous page]  1  2 3  4  5  [Next page]

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.