[syslog-ng] Configuration tuning for reliability

Tue Nov 16 06:34:15 CET 2010

The changes were successful, we did not drop a single log during the last test.

Many thanks Sandor!

On Sat, Nov 13, 2010 at 11:48 AM, Ben Tisdall <ben.tisdall at photobox.com> wrote:
> Thanks to both for your contributions - in this case Sandor I think
> your advice is the most appropriate - I'll feedback to the list after
> the next test, having applied the changes.
>
> On Fri, Nov 12, 2010 at 1:45 PM, Sandor Geller
> <Sandor.Geller at morganstanley.com> wrote:
>> Hi,
>>
>> On Thu, Nov 11, 2010 at 5:54 PM, Matthew Hall <mhall at mhcomputing.net> wrote:
>>> On Thursday, November 11, 2010 08:51:11 Matthew Hall wrote:
>>>> On Thursday, November 11, 2010 08:29:32 Martin Holste wrote:
>>>> > You should not be having problems with your load.  We had a thread
>>>> > earlier this year ("UDP packet loss with syslog-ng") in which Lars
>>>> > identified similar performance issues on RHEL.  His problems were
>>>> > solved by setting the net.core.rmem_default to 2MB using sysctl.  I
>>>> > would try setting that and then checking your performance.
>>>>
>>>> Make sure to also set the so_rcvbuf in syslog-ng on any high volume
>>>> socket based log sources.
>>>>
>>>> You need to have a really big buffer or you will get terrible
>>>> performance. We've been making some efforts to get this into the
>>>> documentation.
>>>>
>>>> I think it's small by default so it doesn't consume a ton of RAM on boxes
>>>> that are not used for log collection.
>>>
>>> By really big I mean 16,777,216.
>>
>> IMHO this is actually a *very* bad advice. don't mix the
>> fire-and-forget UDP logging case with flow-controlled TCP!
>>
>> Going back to the original mail:
>>
>>> Client:
>>>
>>> log_iw_size >= SOURCES_PER_CLIENT * log_fetch_limit
>>>
>>> eg 35 * 10 = 350
>>
>> log_iw_size is used only for flow controlled log paths  log_iw_size is
>> a per-source option just like log_fetch_limit, so you shouldn't use
>> the above math. log_iw_size has to be >= log_fetch_limit in your case,
>> as all of your file sources use their individual incoming windows.
>>
>>> log_fifo_size >= SOURCES_PER_CLIENT * log_fetch_limit
>>>
>>> eg 35 * 10 = 350
>>>
>>> AND
>>>
>>> log_fifo_size >= SOURCES_PER_CLIENT * log_iw_size
>>>
>>> eg 35 * 350 = 12250
>>>
>>> So it appears to me that setting log_fifo_size to > 12250 would be correct.
>>
>> log_fifo_size is used for in-memory buffering for a given destination.
>> When the FIFO is full and flow control is enabled then syslog-ng won't
>> read further logs from the sources. Here the math should be
>>
>> log_fifo_size >= number_of_sources *  log_iw_size
>>
>> so 350 should be the actual setting when log_iw_size is set to 10. Of
>> course increasing log_fifo_size could be useful, but you should be
>> aware that the contents of the FIFO are lost when you use syslog-ng
>> OSE and restart it.
>>
>> Summarising the above: my recommendation would be to enable flow
>> control for file sources, set log_iw_size to be >=  log_fetch_limit.
>> For your loghost destination enable flow control and set log_fifo_size
>> to be at least as big as the accumulated size of all incoming windows.
>>
>>> Loghost
>>>
>>> Less idea about this, do I need:
>>>
>>> log_iw_size >= NUMBER_OF_CLIENTS * log_fetch_limit ( * SOURCES_PER_CLIENT ? )
>>>
>>> eg 40 * 10 * 35 = 14000
>>
>> similar math should be used here as on the client side.
>> sources_per_client doesn't matter, every client is just a single
>> source from the aspect of the server (client-side TCP destinations are
>> mapped 1:1 to server-side TCP sources) so log_iw_size should be just
>> 10, not 14k ! See below for a more detailed explanation.
>>
>> Similarly to the client side on the server every TCP connection has
>> its own incoming buffer while your sources are using the same
>> destination FIFO. You've got 40 clients, log_iw_size is set to 10 on
>> the syslog-ng server so at a given moment up to 40 * 10 messages could
>> be read into the destination FIFO. log_fifo_size has to be set at
>> least to 400 (the default is 1000 so this is definitely met). When you
>> use flow_control (and you definitely should!) then when the mysql
>> destination can't handle the load then syslog-ng will stop reading
>> sources which reached the log_iw_size limit. This will also slow down
>> the syslog clients (but only when the send buffer of the client and
>> the receive buffer of the server are both full otherwise the TCP/IP
>> stack allows sending/receiving logs on the wire). When this happens
>> then depending on the size of the receive and send buffers a *lot* of
>> messages (ten/hundred thousands!) could be in transit so there are in
>> peril: when syslog-ng gets restarted on either side these messages are
>> lost :( For increasing reliability every message should get acked by
>> the application layer.
>>
>> When you aim for reliability then flow-controlled logging is the way
>> to go with fairly low sized receive / send buffers. Of yourse
>> depending on how much is your network latency the buffers should get
>> increased for better performance. There is no generic rule how to size
>> buffers / incoming windows, everyone has to experiment to find the
>> right balance.
>>
>> Disclaimer: I'm not an expert in the subject so feel free to correct me :)
>>
>> Regards,
>>
>> Sandor
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>
>>
>