[syslog-ng] TCP packet collapse errors

Fri May 31 10:12:18 CEST 2013

Thanks for the quick response, Daniel.

I look at statistics for an hour before tweaking flush_lines to zero and
setting log_fifo_size to 10000. In that period, syslog-ng reported
processing 7,898,310,589 messages across all destinations and
dropped 4,200,260.

After making the change (flush_lines set to 0 and log_fifo_size to 10000),
I looked at three sets (half hour) of stats (default, every 10 minutes).
The dropped messages are now zero across all destinations.

But the collapsed TCP packets count keeps incrementing. I ran 'iostat -xm
5' and "watch -d 'netstat -s | grep collpased' " in two windows
side-by-side. Each time that disk IO spikes up, the TCP collapsed counter
starts incrementing. Disk IO remains almost zero for about half a minute
and then spikes up to ~4-25 Mbytes/sec for half a minute.

Does this mean I need to bump up log_fifo_size even higher? I think ideally
we want the disk to be consistently written to instead of bursts of write
activity. Right?

On Thu, May 30, 2013 at 10:56 PM, Daniel Neubacher <
daniel.neubacher at xing.com> wrote:

>  I don't know how much logs you are getting but should tweak "log_fifo_size
> (1000);" to a higher number. Your flush_lines is really high too.. I
> tested around with flush lines but I ended setting it to 0 with 50k log per
> second. And they greatest of all tweaks would be a newer syslog version
> because of the threading.
>  ------------------------------
> *Von:* syslog-ng-bounces at lists.balabit.hu [
> syslog-ng-bounces at lists.balabit.hu]" im Auftrag von "Xuri Nagarin [
> secsubs at gmail.com]
> *Gesendet:* Freitag, 31. Mai 2013 07:46
> *An:* Syslog-ng users' and developers' mailing list
> *Betreff:* [syslog-ng] TCP packet collapse errors
>
>   I have a pair of Syslog-NG servers running 3.2.5-3. The hardware specs
> are - Quad Xeon E5-2680 (32 cores), 32GB RAM, and two 1TB SAS 7200 RPM
> disks in RAID-1.
>
> OS is RHEL6.2 - Kernel 2.6.32-279.5.2. Filesystem is ext3.
>
>  Global options are set as:
>  options {
> flush_lines (1000);
> time_reopen (10);
> log_fifo_size (1000);
> long_hostnames (off);
> use_dns (no);
> use_fqdn (no);
> create_dirs (yes);
> keep_hostname (yes);
> keep_timestamp(yes);
> dir_group("syslog");
> perm(0640);
> dir_perm(0750);
> group("syslog");
> };
>
>  I have already set TCP kernel buffers to 128MB max and set disk
> scheduler to "deadline".
>
>  But even under light disk IO load, from ~8-25MB, I see "1320811067
> packets collapsed in receive queue due to low socket buffer". I had some
> other processes on the host writing to disk. Stopping them reduced the
> packet errors but this number still keeps incrementing.
>
>  To rule out other issues, I temporarily pointed my disk-based
> destinations to /dev/null and then packet losses/errors stopped. So either
> Syslog-NG isn't able to write to disk fast enough or there is an underlying
> OS/hardware issue.
>
>  Both hosts have the same issue. Any pointers in troubleshooting it will
> be appreciated.
>
>  TIA.
>
>
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20130531/855e5611/attachment-0001.htm