Balazs Scheidler wrote:
On Thu, 2008-01-24 at 14:18 -0500, John Morrissey wrote:
About a year ago, I noticed heavy CPU consumption during certain workloads (many processes sending a small number of log messages per process, such as a busy Postfix machine) due to io_iter spinning very rapidly on poll(). We kludged around it by adding the time_sleep() directive, to add an artificial delay at the end of the io_iter loop and prevent the loop from rolling over too quickly:
http://marc.info/?l=syslog-ng&m=114009552929622&w=2
We started using time_sleep(30) across all of our machines, since that delay value didn't seem to cause any problems for our workloads and we wanted to keep the configuration uniform.
We noticed recently that time_sleep() exhibits some inadvertent admission control behavior. When poll() indicates that the listener socket has activity (new connections), syslog-ng seems to accept() only once on it, allowing one new connection per poll(). As a result, it only allows:
1000 / time_sleep()
connections per second. Accordingly, with time_sleep(30), only 33 connections would be allowed every second.
Thanks for the detailed analysis (and for the original idea too :), I think the following solutions exist:
* do multiple accepts per poll loop; or * increase the I/O priority for the listeners
The first one easily increases the incoming connection rate and is simple to implement, the second is more complex and might cause further unexpected behaviour:
if the priority of the listeners is increased, that would mean that any incoming connection might starve the incoming message stream, e.g. if there's a continous stream of incoming connections, then long-living connections might be starved.
So I'd choose the first option, what do you think?
Would it be possible to have an acceptor thread that does not use the time_sleep() and let the I/O reader threads honor this setting? That way there would not be any starvation and there would not be any interaction from time_sleep() and accepting connections. Just throwing this idea out there as I am not aware of the architecture of the code. -- Evan Rempel