[syslog-ng] Configuration tuning for reliability

Sat Nov 13 12:48:33 CET 2010

Thanks to both for your contributions - in this case Sandor I think
your advice is the most appropriate - I'll feedback to the list after
the next test, having applied the changes.

On Fri, Nov 12, 2010 at 1:45 PM, Sandor Geller
<Sandor.Geller at morganstanley.com> wrote:
> Hi,
>
> On Thu, Nov 11, 2010 at 5:54 PM, Matthew Hall <mhall at mhcomputing.net> wrote:
>> On Thursday, November 11, 2010 08:51:11 Matthew Hall wrote:
>>> On Thursday, November 11, 2010 08:29:32 Martin Holste wrote:
>>> > You should not be having problems with your load.  We had a thread
>>> > earlier this year ("UDP packet loss with syslog-ng") in which Lars
>>> > identified similar performance issues on RHEL.  His problems were
>>> > solved by setting the net.core.rmem_default to 2MB using sysctl.  I
>>> > would try setting that and then checking your performance.
>>>
>>> Make sure to also set the so_rcvbuf in syslog-ng on any high volume
>>> socket based log sources.
>>>
>>> You need to have a really big buffer or you will get terrible
>>> performance. We've been making some efforts to get this into the
>>> documentation.
>>>
>>> I think it's small by default so it doesn't consume a ton of RAM on boxes
>>> that are not used for log collection.
>>
>> By really big I mean 16,777,216.
>
> IMHO this is actually a *very* bad advice. don't mix the
> fire-and-forget UDP logging case with flow-controlled TCP!
>
> Going back to the original mail:
>
>> Client:
>>
>> log_iw_size >= SOURCES_PER_CLIENT * log_fetch_limit
>>
>> eg 35 * 10 = 350
>
> log_iw_size is used only for flow controlled log paths  log_iw_size is
> a per-source option just like log_fetch_limit, so you shouldn't use
> the above math. log_iw_size has to be >= log_fetch_limit in your case,
> as all of your file sources use their individual incoming windows.
>
>> log_fifo_size >= SOURCES_PER_CLIENT * log_fetch_limit
>>
>> eg 35 * 10 = 350
>>
>> AND
>>
>> log_fifo_size >= SOURCES_PER_CLIENT * log_iw_size
>>
>> eg 35 * 350 = 12250
>>
>> So it appears to me that setting log_fifo_size to > 12250 would be correct.
>
> log_fifo_size is used for in-memory buffering for a given destination.
> When the FIFO is full and flow control is enabled then syslog-ng won't
> read further logs from the sources. Here the math should be
>
> log_fifo_size >= number_of_sources *  log_iw_size
>
> so 350 should be the actual setting when log_iw_size is set to 10. Of
> course increasing log_fifo_size could be useful, but you should be
> aware that the contents of the FIFO are lost when you use syslog-ng
> OSE and restart it.
>
> Summarising the above: my recommendation would be to enable flow
> control for file sources, set log_iw_size to be >=  log_fetch_limit.
> For your loghost destination enable flow control and set log_fifo_size
> to be at least as big as the accumulated size of all incoming windows.
>
>> Loghost
>>
>> Less idea about this, do I need:
>>
>> log_iw_size >= NUMBER_OF_CLIENTS * log_fetch_limit ( * SOURCES_PER_CLIENT ? )
>>
>> eg 40 * 10 * 35 = 14000
>
> similar math should be used here as on the client side.
> sources_per_client doesn't matter, every client is just a single
> source from the aspect of the server (client-side TCP destinations are
> mapped 1:1 to server-side TCP sources) so log_iw_size should be just
> 10, not 14k ! See below for a more detailed explanation.
>
> Similarly to the client side on the server every TCP connection has
> its own incoming buffer while your sources are using the same
> destination FIFO. You've got 40 clients, log_iw_size is set to 10 on
> the syslog-ng server so at a given moment up to 40 * 10 messages could
> be read into the destination FIFO. log_fifo_size has to be set at
> least to 400 (the default is 1000 so this is definitely met). When you
> use flow_control (and you definitely should!) then when the mysql
> destination can't handle the load then syslog-ng will stop reading
> sources which reached the log_iw_size limit. This will also slow down
> the syslog clients (but only when the send buffer of the client and
> the receive buffer of the server are both full otherwise the TCP/IP
> stack allows sending/receiving logs on the wire). When this happens
> then depending on the size of the receive and send buffers a *lot* of
> messages (ten/hundred thousands!) could be in transit so there are in
> peril: when syslog-ng gets restarted on either side these messages are
> lost :( For increasing reliability every message should get acked by
> the application layer.
>
> When you aim for reliability then flow-controlled logging is the way
> to go with fairly low sized receive / send buffers. Of yourse
> depending on how much is your network latency the buffers should get
> increased for better performance. There is no generic rule how to size
> buffers / incoming windows, everyone has to experiment to find the
> right balance.
>
> Disclaimer: I'm not an expert in the subject so feel free to correct me :)
>
> Regards,
>
> Sandor
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>