Re: [syslog-ng] TCP packet collapse errors

31 May 2013

      Thanks for the quick response, Daniel.

I look at statistics for an hour before tweaking flush_lines to zero and
setting log_fifo_size to 10000. In that period, syslog-ng reported
processing 7,898,310,589 messages across all destinations and
dropped 4,200,260.

After making the change (flush_lines set to 0 and log_fifo_size to 10000),
I looked at three sets (half hour) of stats (default, every 10 minutes).
The dropped messages are now zero across all destinations.

But the collapsed TCP packets count keeps incrementing. I ran 'iostat -xm
5' and "watch -d 'netstat -s | grep collpased' " in two windows
side-by-side. Each time that disk IO spikes up, the TCP collapsed counter
starts incrementing. Disk IO remains almost zero for about half a minute
and then spikes up to ~4-25 Mbytes/sec for half a minute.

Does this mean I need to bump up log_fifo_size even higher? I think ideally
we want the disk to be consistently written to instead of bursts of write
activity. Right?

On Thu, May 30, 2013 at 10:56 PM, Daniel Neubacher <
daniel.neubacher@xing.com> wrote:
...
I don't know how much logs you are getting but should tweak "log_fifo_size
(1000);" to a higher number. Your flush_lines is really high too.. I
tested around with flush lines but I ended setting it to 0 with 50k log per
second. And they greatest of all tweaks would be a newer syslog version
because of the threading.
 ------------------------------
*Von:* syslog-ng-bounces@lists.balabit.hu [
syslog-ng-bounces@lists.balabit.hu]" im Auftrag von "Xuri Nagarin [
secsubs@gmail.com]
*Gesendet:* Freitag, 31. Mai 2013 07:46
*An:* Syslog-ng users' and developers' mailing list
*Betreff:* [syslog-ng] TCP packet collapse errors
I have a pair of Syslog-NG servers running 3.2.5-3. The hardware specs
are - Quad Xeon E5-2680 (32 cores), 32GB RAM, and two 1TB SAS 7200 RPM
disks in RAID-1.
OS is RHEL6.2 - Kernel 2.6.32-279.5.2. Filesystem is ext3.
Global options are set as:
 options {
flush_lines (1000);
time_reopen (10);
log_fifo_size (1000);
long_hostnames (off);
use_dns (no);
use_fqdn (no);
create_dirs (yes);
keep_hostname (yes);
keep_timestamp(yes);
dir_group("syslog");
perm(0640);
dir_perm(0750);
group("syslog");
};
I have already set TCP kernel buffers to 128MB max and set disk
scheduler to "deadline".
But even under light disk IO load, from ~8-25MB, I see "1320811067
packets collapsed in receive queue due to low socket buffer". I had some
other processes on the host writing to disk. Stopping them reduced the
packet errors but this number still keeps incrementing.
To rule out other issues, I temporarily pointed my disk-based
destinations to /dev/null and then packet losses/errors stopped. So either
Syslog-NG isn't able to write to disk fast enough or there is an underlying
OS/hardware issue.
Both hosts have the same issue. Any pointers in troubleshooting it will
be appreciated.
TIA.
______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation:
http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq