time_sleep() and inadvertent admission control
About a year ago, I noticed heavy CPU consumption during certain workloads (many processes sending a small number of log messages per process, such as a busy Postfix machine) due to io_iter spinning very rapidly on poll(). We kludged around it by adding the time_sleep() directive, to add an artificial delay at the end of the io_iter loop and prevent the loop from rolling over too quickly: http://marc.info/?l=syslog-ng&m=114009552929622&w=2 We started using time_sleep(30) across all of our machines, since that delay value didn't seem to cause any problems for our workloads and we wanted to keep the configuration uniform. We noticed recently that time_sleep() exhibits some inadvertent admission control behavior. When poll() indicates that the listener socket has activity (new connections), syslog-ng seems to accept() only once on it, allowing one new connection per poll(). As a result, it only allows: 1000 / time_sleep() connections per second. Accordingly, with time_sleep(30), only 33 connections would be allowed every second. This became problematic when we added a logger into our qmail-pop3d chain. Normally, qmail-pop3d is invoked thusly: qmail-popup $HOSTNAME auth_pop qmail-pop3d ./Maildir/ qmail-popup starts, pops its args, and then exec()s what's left over. By default, qmail's POP3 daemon doesn't log user information, so we added a very simple syslogging script to this exec chain, like so: qmail-popup $HOSTNAME auth_pop qmail-loguser qmail-pop3d ./Maildir/ [jwm@periscope:pts/0 ~> cat qmail-loguser #!/bin/sh logger -t $(basename $0) -- "${HOME##*/} from $TCPREMOTEIP on $TCPLOCALIP" exec "$@" This was disastrous under load, since the admission control behavior caused by time_sleep() made logger(1) block, quickly causing the machine to run out of memory. I realize time_sleep() is a kludge, but more advanced dynamic tuning to replace it seems like it would be error-prone and difficult to troubleshoot when things go wrong. I would also argue that time_sleep() is meant to prevent spinning on poll() and its effect was not intended for *new connections*, since new connections did not cause the original problem of excessive CPU consumption. time_sleep() unnecessarily throttles new connections, which in workloads like qmail's are critical to scalability. For example, syslog-ng still consumes only ~2% of a dual 2.8GHz Xeon machine's CPUs even with this rapid-connection qmail workload. Instead, would it make sense for syslog-ng to accept() on the listener FD (it would need to be marked as non-blocking; I haven't looked into the ramifications if it isn't already) until there are no more new connections waiting, perhaps with a hard ceiling if starvation of other FDs is a worry? john -- John Morrissey _o /\ ---- __o jwm@horde.net _-< \_ / \ ---- < \, www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__
On Thu, 2008-01-24 at 14:18 -0500, John Morrissey wrote:
About a year ago, I noticed heavy CPU consumption during certain workloads (many processes sending a small number of log messages per process, such as a busy Postfix machine) due to io_iter spinning very rapidly on poll(). We kludged around it by adding the time_sleep() directive, to add an artificial delay at the end of the io_iter loop and prevent the loop from rolling over too quickly:
http://marc.info/?l=syslog-ng&m=114009552929622&w=2
We started using time_sleep(30) across all of our machines, since that delay value didn't seem to cause any problems for our workloads and we wanted to keep the configuration uniform.
We noticed recently that time_sleep() exhibits some inadvertent admission control behavior. When poll() indicates that the listener socket has activity (new connections), syslog-ng seems to accept() only once on it, allowing one new connection per poll(). As a result, it only allows:
1000 / time_sleep()
connections per second. Accordingly, with time_sleep(30), only 33 connections would be allowed every second.
Thanks for the detailed analysis (and for the original idea too :), I think the following solutions exist: * do multiple accepts per poll loop; or * increase the I/O priority for the listeners The first one easily increases the incoming connection rate and is simple to implement, the second is more complex and might cause further unexpected behaviour: if the priority of the listeners is increased, that would mean that any incoming connection might starve the incoming message stream, e.g. if there's a continous stream of incoming connections, then long-living connections might be starved. So I'd choose the first option, what do you think? -- Bazsi
Balazs Scheidler wrote:
On Thu, 2008-01-24 at 14:18 -0500, John Morrissey wrote:
About a year ago, I noticed heavy CPU consumption during certain workloads (many processes sending a small number of log messages per process, such as a busy Postfix machine) due to io_iter spinning very rapidly on poll(). We kludged around it by adding the time_sleep() directive, to add an artificial delay at the end of the io_iter loop and prevent the loop from rolling over too quickly:
http://marc.info/?l=syslog-ng&m=114009552929622&w=2
We started using time_sleep(30) across all of our machines, since that delay value didn't seem to cause any problems for our workloads and we wanted to keep the configuration uniform.
We noticed recently that time_sleep() exhibits some inadvertent admission control behavior. When poll() indicates that the listener socket has activity (new connections), syslog-ng seems to accept() only once on it, allowing one new connection per poll(). As a result, it only allows:
1000 / time_sleep()
connections per second. Accordingly, with time_sleep(30), only 33 connections would be allowed every second.
Thanks for the detailed analysis (and for the original idea too :), I think the following solutions exist:
* do multiple accepts per poll loop; or * increase the I/O priority for the listeners
The first one easily increases the incoming connection rate and is simple to implement, the second is more complex and might cause further unexpected behaviour:
if the priority of the listeners is increased, that would mean that any incoming connection might starve the incoming message stream, e.g. if there's a continous stream of incoming connections, then long-living connections might be starved.
So I'd choose the first option, what do you think?
Would it be possible to have an acceptor thread that does not use the time_sleep() and let the I/O reader threads honor this setting? That way there would not be any starvation and there would not be any interaction from time_sleep() and accepting connections. Just throwing this idea out there as I am not aware of the architecture of the code. -- Evan Rempel
On Mon, 2008-01-28 at 10:26 -0800, Evan Rempel wrote:
Balazs Scheidler wrote:
On Thu, 2008-01-24 at 14:18 -0500, John Morrissey wrote:
About a year ago, I noticed heavy CPU consumption during certain workloads (many processes sending a small number of log messages per process, such as a busy Postfix machine) due to io_iter spinning very rapidly on poll(). We kludged around it by adding the time_sleep() directive, to add an artificial delay at the end of the io_iter loop and prevent the loop from rolling over too quickly:
http://marc.info/?l=syslog-ng&m=114009552929622&w=2
We started using time_sleep(30) across all of our machines, since that delay value didn't seem to cause any problems for our workloads and we wanted to keep the configuration uniform.
We noticed recently that time_sleep() exhibits some inadvertent admission control behavior. When poll() indicates that the listener socket has activity (new connections), syslog-ng seems to accept() only once on it, allowing one new connection per poll(). As a result, it only allows:
1000 / time_sleep()
connections per second. Accordingly, with time_sleep(30), only 33 connections would be allowed every second.
Thanks for the detailed analysis (and for the original idea too :), I think the following solutions exist:
* do multiple accepts per poll loop; or * increase the I/O priority for the listeners
The first one easily increases the incoming connection rate and is simple to implement, the second is more complex and might cause further unexpected behaviour:
if the priority of the listeners is increased, that would mean that any incoming connection might starve the incoming message stream, e.g. if there's a continous stream of incoming connections, then long-living connections might be starved.
So I'd choose the first option, what do you think?
Would it be possible to have an acceptor thread that does not use the time_sleep() and let the I/O reader threads honor this setting? That way there would not be any starvation and there would not be any interaction from time_sleep() and accepting connections.
Just throwing this idea out there as I am not aware of the architecture of the code.
The same can be achieved by much simpler changes. I could do a poll() call for all listener fds and if that returns nothing in 10msecs, I'd sleep for time_sleep()-10msec and then poll for everything, but I'm not sure this is worth it. Doing multiple accepts in each poll iteration should be just as good. Syslog-ng is not multithreaded now (this is something I'm planning to change). -- Bazsi
Hi, Balazs Scheidler <bazsi@balabit.hu> [20080128 20:52:36 +0100]:
On Mon, 2008-01-28 at 10:26 -0800, Evan Rempel wrote:
[snipped]
Thanks for the detailed analysis (and for the original idea too :), I think the following solutions exist:
* do multiple accepts per poll loop; or * increase the I/O priority for the listeners
The first one easily increases the incoming connection rate and is simple to implement, the second is more complex and might cause further unexpected behaviour:
if the priority of the listeners is increased, that would mean that any incoming connection might starve the incoming message stream, e.g. if there's a continous stream of incoming connections, then long-living connections might be starved.
So I'd choose the first option, what do you think?
Would it be possible to have an acceptor thread that does not use the time_sleep() and let the I/O reader threads honor this setting? That way there would not be any starvation and there would not be any interaction from time_sleep() and accepting connections.
Just throwing this idea out there as I am not aware of the architecture of the code.
The same can be achieved by much simpler changes. I could do a poll() call for all listener fds and if that returns nothing in 10msecs, I'd sleep for time_sleep()-10msec and then poll for everything, but I'm not sure this is worth it. Doing multiple accepts in each poll iteration should be just as good.
Unsure how portable it is, might only be Linux 2.4+ but looks like BSD and Solaris might also do this, however the following looks good: http://www.kegel.com/c10k.html#nb.sigio Spares you the poll() during idle periods. Instead you use poll() *only* when you know there is something waiting for you. Of course multiple accepts needed but then no need to faff with timers. For those not aware C10k, its a good read.
Syslog-ng is not multithreaded now (this is something I'm planning to change).
Is this to make use of now common SMP setups or to help on IO throughput? If it's on the latter I think I read somewhere mmap'ing a file between a single writer (network input, one per input fd) and a pool of writers (disk output, one per file) would give you shiny performance. Of course this is a completely different architecture than what syslog-ng gives...syslog-ng-ng anyone? :) Cheers Alex
On Sat, Jan 26, 2008 at 03:18:56PM +0100, Balazs Scheidler wrote:
On Thu, 2008-01-24 at 14:18 -0500, John Morrissey wrote:
We noticed recently that time_sleep() exhibits some inadvertent admission control behavior. When poll() indicates that the listener socket has activity (new connections), syslog-ng seems to accept() only once on it, allowing one new connection per poll(). As a result, it only allows:
1000 / time_sleep()
connections per second. Accordingly, with time_sleep(30), only 33 connections would be allowed every second.
Thanks for the detailed analysis (and for the original idea too :), I think the following solutions exist:
* do multiple accepts per poll loop; or * increase the I/O priority for the listeners
The first one easily increases the incoming connection rate and is simple to implement, the second is more complex and might cause further unexpected behaviour:
if the priority of the listeners is increased, that would mean that any incoming connection might starve the incoming message stream, e.g. if there's a continous stream of incoming connections, then long-living connections might be starved.
So I'd choose the first option, what do you think?
I agree; seems like accept(2)ing a number of times in the loop avoids this problem with minimal possibility of collateral damage. john -- John Morrissey _o /\ ---- __o jwm@horde.net _-< \_ / \ ---- < \, www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__
On Sat, 2008-01-26 at 15:18 +0100, Balazs Scheidler wrote:
On Thu, 2008-01-24 at 14:18 -0500, John Morrissey wrote:
About a year ago, I noticed heavy CPU consumption during certain workloads (many processes sending a small number of log messages per process, such as a busy Postfix machine) due to io_iter spinning very rapidly on poll(). We kludged around it by adding the time_sleep() directive, to add an artificial delay at the end of the io_iter loop and prevent the loop from rolling over too quickly:
http://marc.info/?l=syslog-ng&m=114009552929622&w=2
We started using time_sleep(30) across all of our machines, since that delay value didn't seem to cause any problems for our workloads and we wanted to keep the configuration uniform.
We noticed recently that time_sleep() exhibits some inadvertent admission control behavior. When poll() indicates that the listener socket has activity (new connections), syslog-ng seems to accept() only once on it, allowing one new connection per poll(). As a result, it only allows:
1000 / time_sleep()
connections per second. Accordingly, with time_sleep(30), only 33 connections would be allowed every second.
Thanks for the detailed analysis (and for the original idea too :), I think the following solutions exist:
* do multiple accepts per poll loop; or * increase the I/O priority for the listeners
The first one easily increases the incoming connection rate and is simple to implement, the second is more complex and might cause further unexpected behaviour:
if the priority of the listeners is increased, that would mean that any incoming connection might starve the incoming message stream, e.g. if there's a continous stream of incoming connections, then long-living connections might be starved.
So I'd choose the first option, what do you think?
Just a quick note to let you know that I've just pushed a patch that implements multiple accepts per poll iteration. It's in the 2.1 branch, but it is probably trivial to backport it over to 2.0, I'm going to do that when a maintenance release for 2.0 is due. It seems to work for me, and I think it should solve your performance issues. http://git.balabit.hu/?p=bazsi/syslog-ng-2.1.git;a=commit;h=33f45d65490c2e8e... -- Bazsi
On Wed, Feb 20, 2008 at 06:48:09PM +0100, Balazs Scheidler wrote:
On Sat, 2008-01-26 at 15:18 +0100, Balazs Scheidler wrote:
On Thu, 2008-01-24 at 14:18 -0500, John Morrissey wrote:
We noticed recently that time_sleep() exhibits some inadvertent admission control behavior. When poll() indicates that the listener socket has activity (new connections), syslog-ng seems to accept() only once on it, allowing one new connection per poll(). As a result, it only allows:
1000 / time_sleep()
connections per second. Accordingly, with time_sleep(30), only 33 connections would be allowed every second.
* do multiple accepts per poll loop; or * increase the I/O priority for the listeners
The first one easily increases the incoming connection rate and is simple to implement, the second is more complex and might cause further unexpected behaviour:
if the priority of the listeners is increased, that would mean that any incoming connection might starve the incoming message stream, e.g. if there's a continous stream of incoming connections, then long-living connections might be starved.
So I'd choose the first option, what do you think?
Just a quick note to let you know that I've just pushed a patch that implements multiple accepts per poll iteration.
It's in the 2.1 branch, but it is probably trivial to backport it over to 2.0, I'm going to do that when a maintenance release for 2.0 is due.
It seems to work for me, and I think it should solve your performance issues.
http://git.balabit.hu/?p=bazsi/syslog-ng-2.1.git;a=commit;h=33f45d65490c2e8e...
The backport was trivial; it patched cleanly (save for a small offset in one hunk) and the resulting code looks sane. It's great to have a response like this; I really appreciate your work with syslog-ng. Running this patched version under the affected workload seems to completely fix the problem. The maximum number of accept() calls per iteration was only 15, and we only got over 10 accepts/iter about 1% of the time. john -- John Morrissey _o /\ ---- __o jwm@horde.net _-< \_ / \ ---- < \, www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__
participants (4)
-
Alexander Clouter
-
Balazs Scheidler
-
Evan Rempel
-
John Morrissey