time_sleep() and inadvertent admission control

24 Jan 2008

      About a year ago, I noticed heavy CPU consumption during certain workloads
(many processes sending a small number of log messages per process, such as
a busy Postfix machine) due to io_iter spinning very rapidly on poll(). We
kludged around it by adding the time_sleep() directive, to add an artificial
delay at the end of the io_iter loop and prevent the loop from rolling over
too quickly:

http://marc.info/?l=syslog-ng&m=114009552929622&w=2

We started using time_sleep(30) across all of our machines, since that delay
value didn't seem to cause any problems for our workloads and we wanted to
keep the configuration uniform.

We noticed recently that time_sleep() exhibits some inadvertent admission
control behavior. When poll() indicates that the listener socket has
activity (new connections), syslog-ng seems to accept() only once on it,
allowing one new connection per poll(). As a result, it only allows:

  1000 / time_sleep()

connections per second. Accordingly, with time_sleep(30), only 33
connections would be allowed every second. This became problematic when we
added a logger into our qmail-pop3d chain. Normally, qmail-pop3d is invoked
thusly:

  qmail-popup $HOSTNAME auth_pop qmail-pop3d ./Maildir/

qmail-popup starts, pops its args, and then exec()s what's left over.
By default, qmail's POP3 daemon doesn't log user information, so we added a
very simple syslogging script to this exec chain, like so:

  qmail-popup $HOSTNAME auth_pop qmail-loguser qmail-pop3d ./Maildir/

  [jwm@periscope:pts/0 ~> cat qmail-loguser 
  #!/bin/sh

  logger -t $(basename $0) -- "${HOME##*/} from $TCPREMOTEIP on $TCPLOCALIP"
  exec "$@"

This was disastrous under load, since the admission control behavior caused
by time_sleep() made logger(1) block, quickly causing the machine to run out
of memory.

I realize time_sleep() is a kludge, but more advanced dynamic tuning to
replace it seems like it would be error-prone and difficult to troubleshoot
when things go wrong.

I would also argue that time_sleep() is meant to prevent spinning on poll()
and its effect was not intended for *new connections*, since new connections
did not cause the original problem of excessive CPU consumption.
time_sleep() unnecessarily throttles new connections, which in workloads
like qmail's are critical to scalability. For example, syslog-ng still
consumes only ~2% of a dual 2.8GHz Xeon machine's CPUs even with this
rapid-connection qmail workload.

Instead, would it make sense for syslog-ng to accept() on the listener FD
(it would need to be marked as non-blocking; I haven't looked into the
ramifications if it isn't already) until there are no more new connections
waiting, perhaps with a hard ceiling if starvation of other FDs is a worry?

john
-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__

John Morrissey

Balazs Scheidler

Evan Rempel

Balazs Scheidler

Alexander Clouter

John Morrissey

Balazs Scheidler

John Morrissey

tags

participants (4)