On Wed, 2009-06-03 at 14:20 -0400, Jan Schaumann wrote:
Balazs Scheidler <bazsi@balabit.hu> wrote:
Hmm.. one possible problem is that syslog-ng wakes up too often processes a small number of messages and goes back to sleep. Since the poll iteration has its overhead, this might add up to be significant.
You could perhaps play with time_sleep(), I'd go for 30msecs which would limit syslog-ng to wake up at most 30 times per second.
That actually makes things a lot worse, as the buffers immediately fill up and aren't drained quickly enough.
hmm, if you size your input buffer large enough, it shouldn't be an issue, and the CPU usage of syslog-ng should go down significantly.
Then, make sure that you actually have a large enough UDP receive buffer. so_rcvbuf() might not be enough, as systems usually add further limits on the maximum per-socket receive buffer size.
Yeah, that helps a lot. I had initially resisted making those changes as I was trying to see how/if I can tune syslog-ng to get the same performance as regular syslog without any outside changes.
syslog-ng has larger latencies than stock syslogd, since it watches several input file descriptors whereas syslogd only has to care about one UDP socket. Also, syslog-ng uses a generic I/O framework for managing all its I/O related events, whereas syslogd probably uses a plain simple select() to query its inputs. That poll() iteration is what needs more CPU, especially if it runs several thousand times per second. Also the output part of syslog-ng is also non-blocking, whereas syslogd usually sends the message to its output in blocking mode (since UDP sockets never block and files cannot be used in non-blocking mode). All-in-all, we have CPU overhead in the poll() iteration, and more latency because of the non-blocking I/O. I wouldn't think that the message parsing would be a culprit here, I made a serious effort to optimize that (although that was more than a year ago, so cruft might have gathered since then). And latency is what causes the udp() source to drop messages, especially if the input socket buffer is not large enough.
fetch_limit() might be related, if you have only a small number of sources, you could increase that, but don't forget to adjust the destination window size, as described in the documentation:
That helps, too.
syslog-ng core can do about 130k msg/sec without writing things to files, and about 70k/sec if you have a single destination file. however it might have a latency that causes the udp() receive buffer to fill up. If you carefully size your udp() receive buffer you can probably achieve no message losses for about 15k msg/sec.
With the above changes (and the fix for Bug #49, thank you very much), I'm now getting syslog-ng down to between 2% and 4% UDP drops, which is about the same as stock syslog was.
Well, it's slightly worse, since syslog-ng is now dropping (intentionally) a large number of messages that stock syslog is dutyfully writing to disk. Also, the increase buffersize of course also make stock syslog be more performant, but for now the above should be acceptable until I have load balancing added.
-- Bazsi