Thanks for the quick response, Daniel. I look at statistics for an hour before tweaking flush_lines to zero and setting log_fifo_size to 10000. In that period, syslog-ng reported processing 7,898,310,589 messages across all destinations and dropped 4,200,260. After making the change (flush_lines set to 0 and log_fifo_size to 10000), I looked at three sets (half hour) of stats (default, every 10 minutes). The dropped messages are now zero across all destinations. But the collapsed TCP packets count keeps incrementing. I ran 'iostat -xm 5' and "watch -d 'netstat -s | grep collpased' " in two windows side-by-side. Each time that disk IO spikes up, the TCP collapsed counter starts incrementing. Disk IO remains almost zero for about half a minute and then spikes up to ~4-25 Mbytes/sec for half a minute. Does this mean I need to bump up log_fifo_size even higher? I think ideally we want the disk to be consistently written to instead of bursts of write activity. Right? On Thu, May 30, 2013 at 10:56 PM, Daniel Neubacher < daniel.neubacher@xing.com> wrote:
I don't know how much logs you are getting but should tweak "log_fifo_size (1000);" to a higher number. Your flush_lines is really high too.. I tested around with flush lines but I ended setting it to 0 with 50k log per second. And they greatest of all tweaks would be a newer syslog version because of the threading. ------------------------------ *Von:* syslog-ng-bounces@lists.balabit.hu [ syslog-ng-bounces@lists.balabit.hu]" im Auftrag von "Xuri Nagarin [ secsubs@gmail.com] *Gesendet:* Freitag, 31. Mai 2013 07:46 *An:* Syslog-ng users' and developers' mailing list *Betreff:* [syslog-ng] TCP packet collapse errors
I have a pair of Syslog-NG servers running 3.2.5-3. The hardware specs are - Quad Xeon E5-2680 (32 cores), 32GB RAM, and two 1TB SAS 7200 RPM disks in RAID-1.
OS is RHEL6.2 - Kernel 2.6.32-279.5.2. Filesystem is ext3.
Global options are set as: options { flush_lines (1000); time_reopen (10); log_fifo_size (1000); long_hostnames (off); use_dns (no); use_fqdn (no); create_dirs (yes); keep_hostname (yes); keep_timestamp(yes); dir_group("syslog"); perm(0640); dir_perm(0750); group("syslog"); };
I have already set TCP kernel buffers to 128MB max and set disk scheduler to "deadline".
But even under light disk IO load, from ~8-25MB, I see "1320811067 packets collapsed in receive queue due to low socket buffer". I had some other processes on the host writing to disk. Stopping them reduced the packet errors but this number still keeps incrementing.
To rule out other issues, I temporarily pointed my disk-based destinations to /dev/null and then packet losses/errors stopped. So either Syslog-NG isn't able to write to disk fast enough or there is an underlying OS/hardware issue.
Both hosts have the same issue. Any pointers in troubleshooting it will be appreciated.
TIA.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq