[syslog-ng] udp drops

Balazs Scheidler bazsi at balabit.hu
Sat Jun 6 10:28:56 CEST 2009


On Wed, 2009-06-03 at 14:20 -0400, Jan Schaumann wrote:
> Balazs Scheidler <bazsi at balabit.hu> wrote:
>  
> > Hmm.. one possible problem is that syslog-ng wakes up too often
> > processes a small number of messages and goes back to sleep. Since the
> > poll iteration has its overhead, this might add up to be significant.
> > 
> > You could perhaps play with time_sleep(), I'd go for 30msecs which would
> > limit syslog-ng to wake up at most 30 times per second.
> 
> That actually makes things a lot worse, as the buffers immediately fill
> up and aren't drained quickly enough.

hmm, if you size your input buffer large enough, it shouldn't be an
issue, and the CPU usage of syslog-ng should go down significantly.

> 
> > Then, make sure that you actually have a large enough UDP receive
> > buffer. so_rcvbuf() might not be enough, as systems usually add further
> > limits on the maximum per-socket receive buffer size.
> 
> Yeah, that helps a lot.  I had initially resisted making those changes
> as I was trying to see how/if I can tune syslog-ng to get the same
> performance as regular syslog without any outside changes.

syslog-ng has larger latencies than stock syslogd, since it watches
several input file descriptors whereas syslogd only has to care about
one UDP socket. Also, syslog-ng uses a generic I/O framework for
managing all its I/O related events, whereas syslogd probably uses a
plain simple select() to query its inputs.

That poll() iteration is what needs more CPU, especially if it runs
several thousand times per second.

Also the output part of syslog-ng is also non-blocking, whereas syslogd
usually sends the message to its output in blocking mode (since UDP
sockets never block and files cannot be used in non-blocking mode).

All-in-all, we have CPU overhead in the poll() iteration, and more
latency because of the non-blocking I/O. I wouldn't think that the
message parsing would be a culprit here, I made a serious effort to
optimize that (although that was more than a year ago, so cruft might
have gathered since then).

And latency is what causes the udp() source to drop messages, especially
if the input socket buffer is not large enough.

>  
> > fetch_limit() might be related, if you have only a small number of
> > sources, you could increase that, but don't forget to adjust the
> > destination window size, as described in the documentation:
> 
> That helps, too.
> 
> > syslog-ng core can do about 130k msg/sec without writing things to
> > files, and about 70k/sec if you have a single destination file. however
> > it might have a latency that causes the udp() receive buffer to fill up.
> > If you carefully size your udp() receive buffer you can probably achieve
> > no message losses for about 15k msg/sec.
> 
> With the above changes (and the fix for Bug #49, thank you very much),
> I'm now getting syslog-ng down to between 2% and 4% UDP drops, which is
> about the same as stock syslog was.
> 
> Well, it's slightly worse, since syslog-ng is now dropping
> (intentionally) a large number of messages that stock syslog is
> dutyfully writing to disk.  Also, the increase buffersize of course also
> make stock syslog be more performant, but for now the above should be
> acceptable until I have load balancing added.



-- 
Bazsi




More information about the syslog-ng mailing list