<div dir="ltr"><div style>Thanks for the quick response, Daniel.</div><div><br></div>I look at statistics for an hour before tweaking flush_lines to zero and setting log_fifo_size to 10000. In that period, syslog-ng reported processing 7,898,310,589 messages across all destinations and dropped 4,200,260.<div>
<br></div><div>After making the change (flush_lines set to 0 and log_fifo_size to 10000), I looked at three sets (half hour) of stats (default, every 10 minutes). The dropped messages are now zero across all destinations.</div>
<div><br></div><div>But the collapsed TCP packets count keeps incrementing. I ran 'iostat -xm 5' and "watch -d 'netstat -s | grep collpased' " in two windows side-by-side. Each time that disk IO spikes up, the TCP collapsed counter starts incrementing. Disk IO remains almost zero for about half a minute and then spikes up to ~4-25 Mbytes/sec for half a minute.</div>
<div><br></div><div style>Does this mean I need to bump up log_fifo_size even higher? I think ideally we want the disk to be consistently written to instead of bursts of write activity. Right?</div><div style><br></div><div style>
<br></div><div style><br></div><div><br></div><div><br></div><div><br><div><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, May 30, 2013 at 10:56 PM, Daniel Neubacher <span dir="ltr"><<a href="mailto:daniel.neubacher@xing.com" target="_blank">daniel.neubacher@xing.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div style="direction:ltr;font-size:10pt;font-family:Tahoma">
<div><span style="font-family:'Times New Roman';font-size:16px">I don't know how much logs you are getting but should tweak "</span><span style="font-family:'Times New Roman';font-size:16px">log_fifo_size (1000);" to a higher number. Your </span><span style="font-family:'Times New Roman';font-size:16px">flush_lines
is really high too.. I tested around with flush lines but I ended setting it to 0 with 50k log per second. And they greatest of all tweaks would be a newer syslog version because of the threading.</span></div>
<div style="font-size:16px;font-family:Times New Roman">
<hr>
<div style="direction:ltr"><font face="Tahoma" color="#000000"><b>Von:</b> <a href="mailto:syslog-ng-bounces@lists.balabit.hu" target="_blank">syslog-ng-bounces@lists.balabit.hu</a> [<a href="mailto:syslog-ng-bounces@lists.balabit.hu" target="_blank">syslog-ng-bounces@lists.balabit.hu</a>]" im Auftrag von "Xuri Nagarin [<a href="mailto:secsubs@gmail.com" target="_blank">secsubs@gmail.com</a>]<br>
<b>Gesendet:</b> Freitag, 31. Mai 2013 07:46<br>
<b>An:</b> Syslog-ng users' and developers' mailing list<br>
<b>Betreff:</b> [syslog-ng] TCP packet collapse errors<br>
</font><br>
</div><div><div class="h5">
<div></div>
<div>
<div dir="ltr">I have a pair of Syslog-NG servers running 3.2.5-3. The hardware specs are - Quad Xeon E5-2680 (32 cores), 32GB RAM, and two 1TB SAS 7200 RPM disks in RAID-1.
<div> </div>
<div>OS is RHEL6.2 - Kernel 2.6.32-279.5.2. Filesystem is ext3.</div>
<div><br>
</div>
<div>Global options are set as:</div>
<div>
<div>options {</div>
<div><span style="white-space:pre-wrap"></span>flush_lines (1000);</div>
<div><span style="white-space:pre-wrap"></span>time_reopen (10);</div>
<div><span style="white-space:pre-wrap"></span>log_fifo_size (1000);</div>
<div><span style="white-space:pre-wrap"></span>long_hostnames (off);</div>
<div><span style="white-space:pre-wrap"></span>use_dns (no);</div>
<div><span style="white-space:pre-wrap"></span>use_fqdn (no);</div>
<div><span style="white-space:pre-wrap"></span>create_dirs (yes);</div>
<div><span style="white-space:pre-wrap"></span>keep_hostname (yes);</div>
<div><span style="white-space:pre-wrap"></span>keep_timestamp(yes);</div>
<div><span style="white-space:pre-wrap"></span>dir_group("syslog");</div>
<div><span style="white-space:pre-wrap"></span>perm(0640);</div>
<div><span style="white-space:pre-wrap"></span>dir_perm(0750);</div>
<div><span style="white-space:pre-wrap"></span>group("syslog");</div>
<div>};</div>
<div><br>
</div>
<div>I have already set TCP kernel buffers to 128MB max and set disk scheduler to "deadline".</div>
<div><br>
</div>
<div>But even under light disk IO load, from ~8-25MB, I see "1320811067 packets collapsed in receive queue due to low socket buffer". I had some other processes on the host writing to disk. Stopping them reduced the packet errors but this number still
keeps incrementing.</div>
<div><br>
</div>
<div>To rule out other issues, I temporarily pointed my disk-based destinations to /dev/null and then packet losses/errors stopped. So either Syslog-NG isn't able to write to disk fast enough or there is an underlying OS/hardware issue.</div>
<div><br>
</div>
<div>Both hosts have the same issue. Any pointers in troubleshooting it will be appreciated.</div>
<div><br>
</div>
<div>TIA.</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div></div></div>
</div>
</div>
<br>______________________________________________________________________________<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.balabit.com/wiki/syslog-ng-faq" target="_blank">http://www.balabit.com/wiki/syslog-ng-faq</a><br>
<br>
<br></blockquote></div><br></div>