Re: [syslog-ng] TCP packet collapse errors

31 May 2013

      I've forgot to ask for your syslog source settings.
Here is my cfg. Maybe it helps.

tcp(
        log_fetch_limit(1000)
        max-connections(5000)
        so_rcvbuf(51200000)
        keep_timestamp(yes)
        port(514)
        log-iw-size(500000)
);

I've got it from a great artice http://codeascraft.com/2012/08/13/performance-tuning-syslog-ng/

________________________________
Von: syslog-ng-bounces@lists.balabit.hu [syslog-ng-bounces@lists.balabit.hu]" im Auftrag von "Xuri Nagarin [secsubs@gmail.com]
Gesendet: Freitag, 31. Mai 2013 10:12
An: Syslog-ng users' and developers' mailing list
Betreff: Re: [syslog-ng] TCP packet collapse errors

Thanks for the quick response, Daniel.

I look at statistics for an hour before tweaking flush_lines to zero and setting log_fifo_size to 10000. In that period, syslog-ng reported processing 7,898,310,589 messages across all destinations and dropped 4,200,260.

After making the change (flush_lines set to 0 and log_fifo_size to 10000), I looked at three sets (half hour) of stats (default, every 10 minutes). The dropped messages are now zero across all destinations.

But the collapsed TCP packets count keeps incrementing. I ran 'iostat -xm 5' and "watch -d 'netstat -s | grep collpased' " in two windows side-by-side. Each time that disk IO spikes up, the TCP collapsed counter starts incrementing. Disk IO remains almost zero for about half a minute and then spikes up to ~4-25 Mbytes/sec for half a minute.

Does this mean I need to bump up log_fifo_size even higher? I think ideally we want the disk to be consistently written to instead of bursts of write activity. Right?

On Thu, May 30, 2013 at 10:56 PM, Daniel Neubacher <daniel.neubacher@xing.com<mailto:daniel.neubacher@xing.com>> wrote:
I don't know how much logs you are getting but should tweak "log_fifo_size (1000);" to a higher number. Your flush_lines is really high too.. I tested around with flush lines but I ended setting it to 0 with 50k log per second. And they greatest of all tweaks would be a newer syslog version because of the threading.
________________________________
Von: syslog-ng-bounces@lists.balabit.hu<mailto:syslog-ng-bounces@lists.balabit.hu> [syslog-ng-bounces@lists.balabit.hu<mailto:syslog-ng-bounces@lists.balabit.hu>]" im Auftrag von "Xuri Nagarin [secsubs@gmail.com<mailto:secsubs@gmail.com>]
Gesendet: Freitag, 31. Mai 2013 07:46
An: Syslog-ng users' and developers' mailing list
Betreff: [syslog-ng] TCP packet collapse errors

I have a pair of Syslog-NG servers running 3.2.5-3. The hardware specs are - Quad Xeon E5-2680 (32 cores), 32GB RAM, and two 1TB SAS 7200 RPM disks in RAID-1.

OS is RHEL6.2 - Kernel 2.6.32-279.5.2. Filesystem is ext3.

Global options are set as:
options {
flush_lines (1000);
time_reopen (10);
log_fifo_size (1000);
long_hostnames (off);
use_dns (no);
use_fqdn (no);
create_dirs (yes);
keep_hostname (yes);
keep_timestamp(yes);
dir_group("syslog");
perm(0640);
dir_perm(0750);
group("syslog");
};

I have already set TCP kernel buffers to 128MB max and set disk scheduler to "deadline".

But even under light disk IO load, from ~8-25MB, I see "1320811067 packets collapsed in receive queue due to low socket buffer". I had some other processes on the host writing to disk. Stopping them reduced the packet errors but this number still keeps incrementing.

To rule out other issues, I temporarily pointed my disk-based destinations to /dev/null and then packet losses/errors stopped. So either Syslog-NG isn't able to write to disk fast enough or there is an underlying OS/hardware issue.

Both hosts have the same issue. Any pointers in troubleshooting it will be appreciated.

TIA.

______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq