On Fri, 2010-10-15 at 16:39 -0400, Lars Kellogg-Stedman wrote:
Hello all,
I'm deploying syslog-ng 3.0.8 on a quad-core 2.4Ghz system with 4GB of memory. Using stock kernel settings (e.g., without adjusting net.core.rmem_default), we're not able to handle much more than 100 messages/second (generated from a remote host using the "loggen" tool). At 500 msg/sec (-r 500), we see about 50% loss, and at 1000 msg/sec, we see closer to 60% packet loss.
Our configuration looks approximately like this (template definitions elided for brevity):
options { time_reap(30); mark_freq(10); keep_hostname(yes); use_fqdn(yes); dns_cache(2000); dns_cache_expire(86400); };
source s_network { udp(); tcp(port(514)); };
destination d_syslog { file("/srv/syslog/bydate/$YEAR-$MONTH-$DAY/messages" template(t_daily_log) create_dirs(yes) ); file("/srv/syslog/byhost/$FULLHOST_FROM/$YEAR-$MONTH-$DAY" template(t_host_log) create_dirs(yes) ); };
log { source(s_network); destination(d_syslog); };
I didn't think these message rates were terribly high, so I was surprised at the loss. We've confirmed that the loss is entirely between the kernel and the application -- using wireshark, we've verified that all of the packets are arriving at the host, and using this:
awk '{print}' /inet/udp/514/0/0 > out
Our packet loss is < 1%.
If I raise the rmem settings like this:
net.core.rmem_default = 512000 net.core.rmem_max = 1024000
Then it looks like I can support messages rates around 1000 msgs/sec. If I try with 2000 msgs/sec, the loss rates jumps up again (to around 30%).
Do these numbers make sense? This is an unloaded server. The only log traffic hitting this system is from my loggen runs. The filesystem is ext3 on top of a hardware RAID5 array. I've tried fiddling with some of the syslog-ng global options (e.g., flush_lines(), log_fetch_limit()), but without having much impact on performance.
I would appreciate any help you can send our way. Thanks!
Hmm. the numbers you are seeing are indeed low, with sufficient buffer sizes I could get up to the 20k message/sec range with syslog-ng, although it's been a while I last tested it. What I'd recommend is to calculate how much _bytes_ the message rate you are generating means. If you generate 2000 messages, 300 byte each (loggen default IIRC), that's 600000 bytes every second. syslog-ng is single threaded, thus the latency to write to the disk applies. This means that it may take some time for syslog-ng to care about its source, if it is busy writing out messages. This is the #1 reason why I want to work on multithreading. With a flow controlled source, syslog-ng is able to do about 70-75k msg/sec. But not with UDP. In order to improve the numbers, I'd: 1) increase the receive buffer rate to 3-5 seconds (e.g. 3-5MB, not just 0.5) 2) increase log_fetch_limit() to a larger value, this controls how much messages syslog-ng fetches in each poll iteration. Increase this to 3-500 3) increase log_fifo_size() for the destination, by taking the fetch_limit values for each sources feeding the destination (so if you have two sources, each with 1000 fetch limit, then the destination queue should be _at least_ 2000, preferably rounded to the next order of magnitude (e.g. with 2x1000 fetch-limits, increase fifo to 10000) You haven't included in your email whether syslog-ng itself is dropping messages, or the kernel. netstat drop counts or syslog-ng statistics should help decide that. -- Bazsi