The changes were successful, we did not drop a single log during the last test. Many thanks Sandor! On Sat, Nov 13, 2010 at 11:48 AM, Ben Tisdall <ben.tisdall@photobox.com> wrote:
Thanks to both for your contributions - in this case Sandor I think your advice is the most appropriate - I'll feedback to the list after the next test, having applied the changes.
On Fri, Nov 12, 2010 at 1:45 PM, Sandor Geller <Sandor.Geller@morganstanley.com> wrote:
Hi,
On Thu, Nov 11, 2010 at 5:54 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
On Thursday, November 11, 2010 08:51:11 Matthew Hall wrote:
On Thursday, November 11, 2010 08:29:32 Martin Holste wrote:
You should not be having problems with your load. We had a thread earlier this year ("UDP packet loss with syslog-ng") in which Lars identified similar performance issues on RHEL. His problems were solved by setting the net.core.rmem_default to 2MB using sysctl. I would try setting that and then checking your performance.
Make sure to also set the so_rcvbuf in syslog-ng on any high volume socket based log sources.
You need to have a really big buffer or you will get terrible performance. We've been making some efforts to get this into the documentation.
I think it's small by default so it doesn't consume a ton of RAM on boxes that are not used for log collection.
By really big I mean 16,777,216.
IMHO this is actually a *very* bad advice. don't mix the fire-and-forget UDP logging case with flow-controlled TCP!
Going back to the original mail:
Client:
log_iw_size >= SOURCES_PER_CLIENT * log_fetch_limit
eg 35 * 10 = 350
log_iw_size is used only for flow controlled log paths log_iw_size is a per-source option just like log_fetch_limit, so you shouldn't use the above math. log_iw_size has to be >= log_fetch_limit in your case, as all of your file sources use their individual incoming windows.
log_fifo_size >= SOURCES_PER_CLIENT * log_fetch_limit
eg 35 * 10 = 350
AND
log_fifo_size >= SOURCES_PER_CLIENT * log_iw_size
eg 35 * 350 = 12250
So it appears to me that setting log_fifo_size to > 12250 would be correct.
log_fifo_size is used for in-memory buffering for a given destination. When the FIFO is full and flow control is enabled then syslog-ng won't read further logs from the sources. Here the math should be
log_fifo_size >= number_of_sources * log_iw_size
so 350 should be the actual setting when log_iw_size is set to 10. Of course increasing log_fifo_size could be useful, but you should be aware that the contents of the FIFO are lost when you use syslog-ng OSE and restart it.
Summarising the above: my recommendation would be to enable flow control for file sources, set log_iw_size to be >= log_fetch_limit. For your loghost destination enable flow control and set log_fifo_size to be at least as big as the accumulated size of all incoming windows.
Loghost
Less idea about this, do I need:
log_iw_size >= NUMBER_OF_CLIENTS * log_fetch_limit ( * SOURCES_PER_CLIENT ? )
eg 40 * 10 * 35 = 14000
similar math should be used here as on the client side. sources_per_client doesn't matter, every client is just a single source from the aspect of the server (client-side TCP destinations are mapped 1:1 to server-side TCP sources) so log_iw_size should be just 10, not 14k ! See below for a more detailed explanation.
Similarly to the client side on the server every TCP connection has its own incoming buffer while your sources are using the same destination FIFO. You've got 40 clients, log_iw_size is set to 10 on the syslog-ng server so at a given moment up to 40 * 10 messages could be read into the destination FIFO. log_fifo_size has to be set at least to 400 (the default is 1000 so this is definitely met). When you use flow_control (and you definitely should!) then when the mysql destination can't handle the load then syslog-ng will stop reading sources which reached the log_iw_size limit. This will also slow down the syslog clients (but only when the send buffer of the client and the receive buffer of the server are both full otherwise the TCP/IP stack allows sending/receiving logs on the wire). When this happens then depending on the size of the receive and send buffers a *lot* of messages (ten/hundred thousands!) could be in transit so there are in peril: when syslog-ng gets restarted on either side these messages are lost :( For increasing reliability every message should get acked by the application layer.
When you aim for reliability then flow-controlled logging is the way to go with fairly low sized receive / send buffers. Of yourse depending on how much is your network latency the buffers should get increased for better performance. There is no generic rule how to size buffers / incoming windows, everyone has to experiment to find the right balance.
Disclaimer: I'm not an expert in the subject so feel free to correct me :)
Regards,
Sandor ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html