Losing 25% of messages using UDP

Bill Graham

8 Apr 2003 8 Apr '03

1:02 a.m.

I have syslog-ng-1.6.0rc1 compiled on Solaris 9 and I am intend to use the system as a central log server and I need to log every message that makes it to the system. I am testing it out with Kiwi Syslog Message generator with bursts of 100 messages every 10 seconds. Syslog-ng is only logging about 20-25% of the messages. I have verified that the messages are getting to the system. Is there some additional tuning that I have to do to get this to work. Here is a copy of my syslog-ng: options { use_fqdn(yes); keep_hostname(yes); use_dns(no); long_hostnames(off); sync(0); gc_idle_threshold(5000); gc_busy_threshold(1000); log_fifo_size(10240); }; source local { sun-streams("/dev/log" door("/etc/.syslog_door")); internal(); }; source network { udp();}; destination all { file("/var/log/messages"); }; log { source(local); destination(all); }; log { source(network); destination(all); }; Thanks, Bill

Show replies by date

Balazs Scheidler

8 Apr 8 Apr

1:18 p.m.

On Mon, Apr 07, 2003 at 04:02:04PM -0700, Bill Graham wrote:

...

I have syslog-ng-1.6.0rc1 compiled on Solaris 9 and I am intend to use the system as a central log server and I need to log every message that makes it to the system. I am testing it out with Kiwi Syslog Message generator with bursts of 100 messages every 10 seconds. Syslog-ng is only logging about 20-25% of the messages. I have verified that the messages are getting to the system. Is there some additional tuning that I have to do to get this to work. Here is a copy of my syslog-ng: options { use_fqdn(yes); keep_hostname(yes); use_dns(no); long_hostnames(off); sync(0); gc_idle_threshold(5000); gc_busy_threshold(1000); log_fifo_size(10240); };

source local { sun-streams("/dev/log" door("/etc/.syslog_door")); internal(); }; source network { udp();};

destination all { file("/var/log/messages"); }; log { source(local); destination(all); }; log { source(network); destination(all); };

UDP messages might be dropped at several places: * at the sender side (please check that messages are indeed sent to the network) * on the network itself (this is not common, only when the link is saturated) * on the receiver side if the receiving program does not issue recv() requests fast enough. You can use netstat to check buffer space and/or truss to check whether syslog-ng really receives messages. You have to identify the point where you are losing messages because syslog-ng is probably not the culprit. -- Bazsi

Bill Graham

9 Apr 9 Apr

1:03 a.m.

Balazs Scheidler wrote:

...

On Mon, Apr 07, 2003 at 04:02:04PM -0700, Bill Graham wrote:

...
I have syslog-ng-1.6.0rc1 compiled on Solaris 9 and I am intend to use the system as a central log server and I need to log every message that makes it to the system. I am testing it out with Kiwi Syslog Message generator with bursts of 100 messages every 10 seconds. Syslog-ng is only logging about 20-25% of the messages. I have verified that the messages are getting to the system. Is there some additional tuning that I have to do to get this to work. Here is a copy of my syslog-ng: options { use_fqdn(yes); keep_hostname(yes); use_dns(no); long_hostnames(off); sync(0); gc_idle_threshold(5000); gc_busy_threshold(1000); log_fifo_size(10240); };

source local { sun-streams("/dev/log" door("/etc/.syslog_door")); internal(); }; source network { udp();};

destination all { file("/var/log/messages"); }; log { source(local); destination(all); }; log { source(network); destination(all); };

UDP messages might be dropped at several places:

* at the sender side (please check that messages are indeed sent to the network) * on the network itself (this is not common, only when the link is saturated) * on the receiver side if the receiving program does not issue recv() requests fast enough.

You can use netstat to check buffer space and/or truss to check whether syslog-ng really receives messages. You have to identify the point where you are losing messages because syslog-ng is probably not the culprit.

Ok, I have checked to see if all of the messages are being sent over the network from the source. I have also checked the receiving end to see if all of the connections are getting to this system. I used the snoop command to find this out. When I sent a burst of 100 messages I saw 100 connections from the source system. When I did a truss of the syslog-ng process I only saw around 75 recvfrom()'s. It looks like the third option is what is happening. Is there a way to speed up the issuing of recv()'s? Bill

Balazs Scheidler

10 Apr 10 Apr

9:27 a.m.

On Tue, Apr 08, 2003 at 04:03:36PM -0700, Bill Graham wrote:

...

Ok, I have checked to see if all of the messages are being sent over the network from the source. I have also checked the receiving end to see if all of the connections are getting to this system. I used the snoop command to find this out. When I sent a burst of 100 messages I saw 100 connections from the source system. When I did a truss of the syslog-ng process I only saw around 75 recvfrom()'s. It looks like the third option is what is happening. Is there a way to speed up the issuing of recv()'s?

syslog-ng uses a poll() loop to check whether a given source (e.g. UDP socket) is readable and once it is, it issues one single recvfrom() and then returns to the mainloop. So your host is not fast enough to keep up with the message rate (at least not when using one message/poll loop) The following options are available: 1) upgrade the hw 2) increase the default UDP sockbuf size to keep up with bursts 3) implementat issuing several recvfrom() when poll() indicates readability The 3) option involves adding a loop in sources.c:do_read_line() function which would call recvfrom() as long as it returns that nothing is available. Maybe an upper limit to avoid starving other sources would be needed (say read until anything is available but no more than 10 messages) -- Bazsi

Bill Graham

11 Apr 11 Apr

3:49 a.m.

Balazs Scheidler wrote:

...

On Tue, Apr 08, 2003 at 04:03:36PM -0700, Bill Graham wrote:

...
Ok, I have checked to see if all of the messages are being sent over the network from the source. I have also checked the receiving end to see if all of the connections are getting to this system. I used the snoop command to find this out. When I sent a burst of 100 messages I saw 100 connections from the source system. When I did a truss of the syslog-ng process I only saw around 75 recvfrom()'s. It looks like the third option is what is happening. Is there a way to speed up the issuing of recv()'s?

syslog-ng uses a poll() loop to check whether a given source (e.g. UDP socket) is readable and once it is, it issues one single recvfrom() and then returns to the mainloop.

So your host is not fast enough to keep up with the message rate (at least not when using one message/poll loop)

The following options are available: 1) upgrade the hw 2) increase the default UDP sockbuf size to keep up with bursts 3) implementat issuing several recvfrom() when poll() indicates readability

The 3) option involves adding a loop in sources.c:do_read_line() function which would call recvfrom() as long as it returns that nothing is available. Maybe an upper limit to avoid starving other sources would be needed (say read until anything is available but no more than 10 messages)

I increased my default UDP sockbuf and it seems to have solved the problem.

Balazs Scheidler

2:44 p.m.

On Thu, Apr 10, 2003 at 06:49:56PM -0700, Bill Graham wrote:

...

Balazs Scheidler wrote:

...
syslog-ng uses a poll() loop to check whether a given source (e.g. UDP socket) is readable and once it is, it issues one single recvfrom() and then returns to the mainloop.

So your host is not fast enough to keep up with the message rate (at least not when using one message/poll loop)

The following options are available: 1) upgrade the hw 2) increase the default UDP sockbuf size to keep up with bursts 3) implementat issuing several recvfrom() when poll() indicates readability

The 3) option involves adding a loop in sources.c:do_read_line() function which would call recvfrom() as long as it returns that nothing is available. Maybe an upper limit to avoid starving other sources would be needed (say read until anything is available but no more than 10 messages)

I increased my default UDP sockbuf and it seems to have solved the problem.

Although it might solve the problem for the bursts you are testing with you might run into limits again with greater loads. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1

Bill Graham

6:47 p.m.

Balazs Scheidler wrote:

...

On Thu, Apr 10, 2003 at 06:49:56PM -0700, Bill Graham wrote:

...
Balazs Scheidler wrote:

...
syslog-ng uses a poll() loop to check whether a given source (e.g. UDP socket) is readable and once it is, it issues one single recvfrom() and then returns to the mainloop.

So your host is not fast enough to keep up with the message rate (at least not when using one message/poll loop)

The following options are available: 1) upgrade the hw 2) increase the default UDP sockbuf size to keep up with bursts 3) implementat issuing several recvfrom() when poll() indicates readability

The 3) option involves adding a loop in sources.c:do_read_line() function which would call recvfrom() as long as it returns that nothing is available. Maybe an upper limit to avoid starving other sources would be needed (say read until anything is available but no more than 10 messages)

I increased my default UDP sockbuf and it seems to have solved the problem.

Although it might solve the problem for the bursts you are testing with you might run into limits again with greater loads.

I agree that I might run into the problems again, but I have tested the system with bursts of 500 messages and it seems to handle the load. Upgrading the system probably wouldn't solve the problem and would most likely be a waste of hardware. I am running on a E250 w/2 450MHz processors and 4Gb of RAM. I am also not experienced enough of a programmer to rewrite the program. Is this something that you could address in a future release of the software? Bill

Gregor Binder

7:19 p.m.

Bill Graham on Fri, Apr 11, 2003 at 09:47:32AM -0700: Bill,

...

Upgrading the system probably wouldn't solve the problem and would most likely be a waste of hardware. I am running on a E250 w/2 450MHz processors and 4Gb of RAM.

assuming this host is doing logging only, I would think you would need more disks than the E250 fits until this kind of CPU and memory resour- ces will be exhausted. Have you tried monitoring performance on the box using vmstat, iostat and sar? I can't see (judging by the figures you mention from your tes- ting) how a machine like this could be bound by CPU or RAM when it does not recv() fast enough. If you haven't already, check if vmstat shows processes waiting for I/O, and check iostat for busy disks. If you find you're asking too much from these devices, Raid-0 might help. And maybe you would find out you can remove one CPU and about 3GB of RAM :) Regards, -- ____ ____ / _/| - > Gregor Binder <gb@(rootnexus.net|sysfive.com)> | / || _\ \ \__ Id: 0xE2F31C4B Fp: 8B8A 5CE3 B79B FBF1 5518 8871 0EFB AFA3 E2F3 1C4B

Gregor Binder

7:50 p.m.

Gregor Binder on Fri, Apr 11, 2003 at 07:19:44PM +0200: Have to reply to myself, sorry ...

...

If you haven't already, check if vmstat shows processes waiting for I/O, and check iostat for busy disks. If you find you're asking too much from these devices, Raid-0 might help. And maybe you would find out you can remove one CPU and about 3GB of RAM :)

if you find the disk(s) to be too busy, you may also want to try playing with sync(), which might unload your disk also (if you're waiting for disks, it's probably not because of unsufficient throughput). You might have to use small sampling intervals with the monitoring tools the way you're testing it ... Greetings, -- ____ ____ / _/| - > Gregor Binder <gb@(rootnexus.net|sysfive.com)> | / || _\ \ \__ Id: 0xE2F31C4B Fp: 8B8A 5CE3 B79B FBF1 5518 8871 0EFB AFA3 E2F3 1C4B

Bill Graham

11:19 p.m.

Gregor Binder wrote:

...

Gregor Binder on Fri, Apr 11, 2003 at 07:19:44PM +0200:

Have to reply to myself, sorry ...

...
If you haven't already, check if vmstat shows processes waiting for I/O, and check iostat for busy disks. If you find you're asking too much from these devices, Raid-0 might help. And maybe you would find out you can remove one CPU and about 3GB of RAM :)

if you find the disk(s) to be too busy, you may also want to try playing with sync(), which might unload your disk also (if you're waiting for disks, it's probably not because of unsufficient throughput). You might have to use small sampling intervals with the monitoring tools the way you're testing it ...

Greetings,

I guess I should have included the rest of my configuration...I am also running 2 A1000 disk arrays on seperate SCSI channels, using raid 5 and a fiber GB ethernet card. I know this system is a little overkill, but we are also going to put some additional services on the box. I have checked iostat and vmstat and everything looks fine. It just seems that the program can't keep up when the udp buffer is set to the default of 8K. Once you bring this number to the system max of 64K the system can handle bursts of around 750 syslog messages. Bill

nate

12 Apr 12 Apr

4:01 a.m.

Bill Graham said:

...

checked iostat and vmstat and everything looks fine. It just seems that the program can't keep up when the udp buffer is set to the default of 8K. Once you bring this number to the system max of 64K the system can handle bursts of around 750 syslog messages.

i haven't followed this whole thread since i've only been on the list a couple days but have read all the msgs since..was curious if you had considered using syslog-ng on the clients and have them use TCP instead of UDP for the syslog traffic? maybe it is faster/more efficient? i'm currently only running 1 syslog-ng client using tcp, but it seems to work fine... nate

Gregor Binder

3:52 p.m.

Bill Graham on Fri, Apr 11, 2003 at 02:19:48PM -0700: Hi Bill,

...

I guess I should have included the rest of my configuration...

you're right, this sounds like one happy syslog server :)

...

It just seems that the program can't keep up when the udp buffer is set to the default of 8K. Once you bring this number to the system max of 64K the system can handle bursts of around 750 syslog messages.

Do you see message loss with higher rates again? Increasing sockbuf certainly does make sense, but it still wouldn't explain why a userland- program can't keep up if your system resources are not exhausted. You could maybe try to avoid interrupts/context switches by using async disk access, and see if that improves performance. Binding syslog-ng to one CPU is probably not reasonable, since you plan to deploy other applications. Then obviously, you will have to expect performance to drop again, if hardware (unlikely, I admit) or kernel resources are the real limitation you're looking at. Regards, -- ____ ____ / _/| - > Gregor Binder <gb@(rootnexus.net|sysfive.com)> | / || _\ \ \__ Id: 0xE2F31C4B Fp: 8B8A 5CE3 B79B FBF1 5518 8871 0EFB AFA3 E2F3 1C4B

8214

Age (days ago)

8219

Last active (days ago)

List overview

Download

11 comments

4 participants

participants (4)

Balazs Scheidler
Bill Graham
Gregor Binder
nate