[syslog-ng] Profiling syslog-ng

Mon Feb 13 16:57:20 CET 2006

On Sun, Feb 12, 2006 at 08:07:14PM +0100, Balazs Scheidler wrote:
> Can you check if the attached patch decreases your load? It avoids
> getting back to the poll loop after every log message and loops on the
> currently processed fd until it either returns EAGAIN or it reaches a
> hard-coded number. (which is currently 30)

Unfortunately, it didn't help that much. I strace(1)ed it for a few minutes;
here are some read(2) statistics. It shows how many reads were done before
encountering an EAGAIN:

  Count - # Reads
  44268 - 1 reads
    370 - 2 reads
    279 - 3 reads
      9 - 4 reads
      6 - 5 reads
      1 - 6 reads
    [tail removed - ~34 lines with a count < 10]

Repeated reads might not make sense for our load pattern, since there are
many, many processes all syslogging a line or two at once. Few processes (as
you can see from the stats) syslog several messages in immediate succession.

On Sun, Feb 12, 2006 at 02:53:55PM +0100, Balazs Scheidler wrote:
> Strange, it's been a while since I last profiled syslog-ng. What were
> your build options?

They're pretty stock. We used the Debian package for 1.6.9, adding 'debug'
to DEB_BUILD_OPTIONS, which seems to use:

./configure --prefix=/ --mandir=/usr/share/man --sysconfdir=/etc \
	--enable-debug

and '-g -Wall' for CFLAGS. We added -pg manually.

> It would be nice to know how blocking affects this, io_iter is the
> central poll loop and unless syslog-ng is spinning for some reason it
> should be really low in the profile. Hm... as I see you have
> max-connections set to 2000, how many connections are there in reality?
> Can you check lsof -p <syslog-ng pid> how many open fds syslog-ng is
> polling?

Currently, it has about 1,000 file descriptors open. We had to increase it
since we were hitting the stock limit (1024?) at peak times.

> io_iter basically traverses all registered fds and constructs a "struct
> pollfd" array every time it is invoked. If there are a lot of fds, this
> might be a problem.

I agree; poll(2) is fairly cheap(?), so regenerating the struct pollfd is a
likely suspect. Could the struct pollfd be cached, regenerated every n msec?
That way latency for new processes isn't too bad, but it keeps syslog-ng
from spinning so much?

Or perhaps it could keep a struct pollfd that is only regenerated when file
descriptors are opened/closed. Our processes are fairly long-lived, so this
too could keep the struct from being regenerated so often. Even if logging
processes are not long-lived, I doubt it would be regenerated more often
than once every second or few.

Here are some poll(2) stats. It seems that over half of poll(2) calls return
a single file descriptor with activity.

  Count - # of descriptors returned by poll(2)
  20140 - 1
   8372 - 2
   3941 - 3
   1601 - 4
    635 - 5
    260 - 6
    103 - 7
     62 - 8
     30 - 9
     15 - 10
    [tail removed - ~40 lines with a count < 10]

Thanks for your help!

john
-- 
John Morrissey          _o            /\         ----  __o
jwm at horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__