[syslog-ng] TCP packet collapse errors

Xuri Nagarin secsubs at gmail.com
Fri May 31 21:41:26 CEST 2013


Thanks Daniel. I think I owe you a six-pack of your favourite beer just for
the script :)

Very interesting stats. For both hosts, lps ranges from anywhere between
few hundred to 30K per second but falls to zero for several seconds. All
the packet loss happens when the lps is peaking. What has me concerned is
that lps falls to zero for 15-20 seconds at a time. During this period disk
IO falls to near zero too and packet hemorrhaging pauses.

All my sources are tcp and there are only four of them but those are
proprietary products so they may be sending data in batches/bursts. I will
tweak the tcp source options and see what more I can squeeze out of this
build of syslog-ng and the disks. But you are right - I do need to upgrade
to the multi-threaded version asap. At some point, a single core process is
going to stall on writing to disk and not be able to reap tcp buffers
during that time.

I think the disk scheduler probably did not have any impact but one system
has the disk scheduler tweak and other doesn't but I don't see any
significant differences with packet loss on either.

Right now, I think the issue simply is that when a burst of data comes in,
the single core syslog-ng blocks of IO and cannot switch back to reading
TCP buffers fast enough to clear up everything in the buffer. High network
IO, high Disk IO, high lps  and packet loss - all match up.

A larger receive buffer for the tcp source in syslog config will be a
band-aid, I think until I can build a multi-threaded syslog-ng. Eventually,
given that these are 7200RPM disks in RAID-1, the disk sub-system will
start being a bottleneck but by then, hopefully, I will find a host with
better disk througput :-)





On Fri, May 31, 2013 at 2:41 AM, Daniel Neubacher <daniel.neubacher at xing.com
> wrote:

>  I've forgot to ask for your syslog source settings.
> Here is my cfg. Maybe it helps.
>
>  tcp(
>         log_fetch_limit(1000)
>         max-connections(5000)
>         so_rcvbuf(51200000)
>         keep_timestamp(yes)
>         port(514)
>         log-iw-size(500000)
> );
>
>  I've got it from a great artice
> http://codeascraft.com/2012/08/13/performance-tuning-syslog-ng/
>
>
>
> ------------------------------
> *Von:* syslog-ng-bounces at lists.balabit.hu [
> syslog-ng-bounces at lists.balabit.hu]" im Auftrag von "Xuri Nagarin [
> secsubs at gmail.com]
> *Gesendet:* Freitag, 31. Mai 2013 10:12
>
> *An:* Syslog-ng users' and developers' mailing list
> *Betreff:* Re: [syslog-ng] TCP packet collapse errors
>
>   Thanks for the quick response, Daniel.
>
>  I look at statistics for an hour before tweaking flush_lines to zero and
> setting log_fifo_size to 10000. In that period, syslog-ng reported
> processing 7,898,310,589 messages across all destinations and
> dropped 4,200,260.
>
>  After making the change (flush_lines set to 0 and log_fifo_size to
> 10000), I looked at three sets (half hour) of stats (default, every 10
> minutes). The dropped messages are now zero across all destinations.
>
>  But the collapsed TCP packets count keeps incrementing. I ran 'iostat
> -xm 5' and "watch -d 'netstat -s | grep collpased' " in two windows
> side-by-side. Each time that disk IO spikes up, the TCP collapsed counter
> starts incrementing. Disk IO remains almost zero for about half a minute
> and then spikes up to ~4-25 Mbytes/sec for half a minute.
>
>  Does this mean I need to bump up log_fifo_size even higher? I think
> ideally we want the disk to be consistently written to instead of bursts of
> write activity. Right?
>
>
>
>
>
>
>
>
>
> On Thu, May 30, 2013 at 10:56 PM, Daniel Neubacher <
> daniel.neubacher at xing.com> wrote:
>
>>  I don't know how much logs you are getting but should tweak "log_fifo_size
>> (1000);" to a higher number. Your flush_lines is really high too.. I
>> tested around with flush lines but I ended setting it to 0 with 50k log per
>> second. And they greatest of all tweaks would be a newer syslog version
>> because of the threading.
>>  ------------------------------
>> *Von:* syslog-ng-bounces at lists.balabit.hu [
>> syslog-ng-bounces at lists.balabit.hu]" im Auftrag von "Xuri Nagarin [
>> secsubs at gmail.com]
>> *Gesendet:* Freitag, 31. Mai 2013 07:46
>> *An:* Syslog-ng users' and developers' mailing list
>> *Betreff:* [syslog-ng] TCP packet collapse errors
>>
>>    I have a pair of Syslog-NG servers running 3.2.5-3. The hardware
>> specs are - Quad Xeon E5-2680 (32 cores), 32GB RAM, and two 1TB SAS 7200
>> RPM disks in RAID-1.
>>
>> OS is RHEL6.2 - Kernel 2.6.32-279.5.2. Filesystem is ext3.
>>
>>  Global options are set as:
>>  options {
>> flush_lines (1000);
>> time_reopen (10);
>> log_fifo_size (1000);
>> long_hostnames (off);
>> use_dns (no);
>> use_fqdn (no);
>> create_dirs (yes);
>> keep_hostname (yes);
>> keep_timestamp(yes);
>> dir_group("syslog");
>> perm(0640);
>> dir_perm(0750);
>> group("syslog");
>> };
>>
>>  I have already set TCP kernel buffers to 128MB max and set disk
>> scheduler to "deadline".
>>
>>  But even under light disk IO load, from ~8-25MB, I see "1320811067
>> packets collapsed in receive queue due to low socket buffer". I had some
>> other processes on the host writing to disk. Stopping them reduced the
>> packet errors but this number still keeps incrementing.
>>
>>  To rule out other issues, I temporarily pointed my disk-based
>> destinations to /dev/null and then packet losses/errors stopped. So either
>> Syslog-NG isn't able to write to disk fast enough or there is an underlying
>> OS/hardware issue.
>>
>>  Both hosts have the same issue. Any pointers in troubleshooting it will
>> be appreciated.
>>
>>  TIA.
>>
>>
>>
>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation:
>> http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>
>>
>>
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20130531/c14a8f9c/attachment.htm 


More information about the syslog-ng mailing list