Hi Mike,<div>I'm heading out of town on a trip, so not enough time to read the whole thread.</div><div>You may or may not have tried some of this, but I had similar issues a while back and noted it here:</div><div><a href="http://nms.gdd.net/index.php/Install_Guide_for_LogZilla_v3.1#UDP_Buffers">http://nms.gdd.net/index.php/Install_Guide_for_LogZilla_v3.1#UDP_Buffers</a></div>
<div><a href="http://nms.gdd.net/index.php/Install_Guide_for_LogZilla_v3.1#UDP_Buffers"></a>Hope it helps :-)</div><div><br></div><div><br clear="all">______________________________________________________________ <br><br>
Clayton Dukes<br>______________________________________________________________<br>
<br><br><div class="gmail_quote">On Fri, Apr 15, 2011 at 2:01 PM, Mishou Michael <span dir="ltr"><<a href="mailto:Michael.Mishou@csirc.irs.gov">Michael.Mishou@csirc.irs.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Matthew,<br>
<br>
Thanks for the suggestion. I'm not using so_sndbuf anywhere in this<br>
configuration, just recieving and writing directly to disk. As for<br>
so_rcvbuf, I've already tried that per the initial message, no dice.<br>
Even if I run a so_rcvbuf size that is 10 times the recommended value in<br>
the configuration note you linked to, it still fills up and then<br>
drops/udpInOverflows start to occur at the rate of about 5k/sec.<br>
<br>
Is there something else I'm missing in the config perhaps? The setting<br>
of so_rcvbuf to a 64 MB buffer only delays the problem for a few seconds<br>
until the buffer again fills. If I set it to 1 GB (tried this, have a<br>
ton of RAM to work with) it delays the problem for about 10 minutes,<br>
then the drops start. It seems as if the buffer is not being emptied<br>
fast enough, but the CPU is by no means pegged by syslog-ng.<br>
<br>
I left out the resources I have to work with on this system, and how<br>
bad/good things are with syslog-ng running (and dropping), I'll include<br>
those now. As you can see, it's an older server, but it has a ton of<br>
RAM and the CPUs should have enough pop for this I think.<br>
<br>
# uname -a<br>
SunOS ms00310 5.10 Generic_127111-10 sun4u sparc SUNW,Sun-Fire-V490<br>
Solaris<br>
# psrinfo -v | grep MHz<br>
The sparcv9 processor operates at 1350 MHz,<br>
The sparcv9 processor operates at 1350 MHz,<br>
The sparcv9 processor operates at 1350 MHz,<br>
The sparcv9 processor operates at 1350 MHz,<br>
The sparcv9 processor operates at 1350 MHz,<br>
The sparcv9 processor operates at 1350 MHz,<br>
The sparcv9 processor operates at 1350 MHz,<br>
The sparcv9 processor operates at 1350 MHz,<br>
# swap -s<br>
total: 4042128k bytes allocated + 967184k reserved = 5009312k used,<br>
48662184k available<br>
# ps -e -o pcpu -o pid -o user -o args | grep syslog<br>
0.0 70 root vxconfigd -x syslog -m boot<br>
0.0 6110 root grep syslog<br>
7.7 22802 root /usr/local/sbin/syslog-ng -f /etc/crap_config.txt -p<br>
/var/run/syslog-ng<br>
0.0 22801 root /usr/local/sbin/syslog-ng -f /etc/crap_config.txt -p<br>
/var/run/syslog-ng<br>
# top -b -n 5<br>
last pid: 6355; load avg: 1.36, 1.34, 1.34; up 58+23:59:11<br>
17:37:27<br>
94 processes: 91 sleeping, 3 on cpu<br>
CPU states: 82.1% idle, 7.0% user, 10.9% kernel, 0.0% iowait, 0.0%<br>
swap<br>
Memory: 32G phys mem, 16G free mem, 32G total swap, 32G free swap<br>
<br>
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND<br>
22802 root 2 50 0 3067M 3063M cpu/2 79:50 7.81% syslog-ng<br>
29459 root 15 40 0 193M 166M cpu/1 150.5H 4.49% issCSF<br>
6352 root 1 55 0 3376K 2032K cpu/19 0:00 0.20% top<br>
4229 root 82 59 0 327M 324M sleep 661:05 0.17% java<br>
2695 root 6 59 0 8000K 2984K sleep 802:59 0.11% rmserver<br>
<br>
I'm just not sure what to do next to troubleshoot. I'm hoping someone<br>
here can point me in the right direction, or at least confirm that they<br>
are running syslog-ng in a similar configuration without drops so I know<br>
that it's at least possible?<br>
<br>
Regards,<br>
<br>
--Mike<br>
<div><div></div><div class="h5"><br>
-----Original Message-----<br>
From: <a href="mailto:syslog-ng-bounces@lists.balabit.hu">syslog-ng-bounces@lists.balabit.hu</a><br>
[mailto:<a href="mailto:syslog-ng-bounces@lists.balabit.hu">syslog-ng-bounces@lists.balabit.hu</a>] On Behalf Of Matthew Hall<br>
Sent: Friday, April 15, 2011 12:12 PM<br>
To: Syslog-ng users' and developers' mailing list<br>
Subject: Re: [syslog-ng] Solaris 10 UDP overflows, message drops<br>
<br>
Probably you need to adjust so_sndbuf and so_rcvbuf:<br>
<br>
<a href="http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-v3.2-
guide-admin-en.html/index.html-single.html#reference_source_tcpudp" target="_blank">http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-v3.2-<br>
guide-admin-en.html/index.html-single.html#reference_source_tcpudp</a><br>
<br>
That should make it run better.<br>
<br>
Matthew.<br>
<br>
On Fri, Apr 15, 2011 at 10:52:59AM -0400, Mishou Michael wrote:<br>
> All,<br>
><br>
> I've done a lot of reading, and I can't figure out what I can do to<br>
this<br>
> config in order to fix the UDP drops due to udpInOverflows on netstat<br>
> -s. Here are some statistics relating to the amount of traffic we<br>
> receive via syslog-ng, it's pretty busy but in reading I'm finding<br>
that<br>
> some folks are doing much more. These stats are based on a ~30 second<br>
> window of traffic during peak times, but variance due to time is not<br>
so<br>
> much in our environment. I used tcpdump with a bpf to capture only<br>
> inbound udp/514, so this is what the interface is seeing in the way of<br>
> syslog.<br>
><br>
> Elapsed: 00:00:34<br>
> Packets: 200000<br>
> Avg. packets/sec: 5836.546<br>
> Avg. packet size: 303.182 bytes<br>
> Bytes: 60636477<br>
> Avg. bytes/sec: 1769537.884<br>
> Avg. MBit/sec: 14.156<br>
><br>
> So, about 6k messages per second. Here are the drop numbers over a<br>
time<br>
> sample (done right after a process restart, you can see the buffer<br>
takes<br>
> a moment to fill up [64 MB so_rcvbuf]):<br>
><br>
> # while true; do echo -en "$(date) :: "; netstat -s | grep<br>
> udpInOverflows | head -n 1 | sed 's|.*=||'; sleep 10; done<br>
> Fri Apr 15 14:12:46 GMT 2011 :: 472517477<br>
> Fri Apr 15 14:12:56 GMT 2011 :: 472517477<br>
> Fri Apr 15 14:13:06 GMT 2011 :: 472517477<br>
> Fri Apr 15 14:13:16 GMT 2011 :: 472517477<br>
> Fri Apr 15 14:13:26 GMT 2011 :: 472543152<br>
> Fri Apr 15 14:13:36 GMT 2011 :: 472592800<br>
> Fri Apr 15 14:13:46 GMT 2011 :: 472638848<br>
> Fri Apr 15 14:13:56 GMT 2011 :: 472684407<br>
><br>
> So that's about 5k overflows a second, which jives with our<br>
> calculations, suggesting we're getting only ~10% of our messages<br>
logged<br>
> to disk.<br>
><br>
> I inherited a config with _very_ many filter statements, but have<br>
> decided to cut all that out to see if my performance problems in the<br>
way<br>
> of udp drops continue (they do). I've attached a sanitized config to<br>
> this message, all the stuff here concerns this config running (even<br>
> though I thought eliminating the filters would really help, it<br>
didn't).<br>
><br>
> We're running Solaris 10 SPARC. The syslog-ng version is:<br>
><br>
> # /usr/local/sbin/syslog-ng -V<br>
> syslog-ng 3.1.2<br>
> Installer-Version: 3.1.2<br>
> Revision:<br>
><br>
ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainli<br>
> ne--3.1#master#8bf13c304b6ab5fc1a372b49d55c78370efe14ca<br>
> Compile-Date: Oct 25 2010 23:56:18<br>
> Enable-Threads: off<br>
> Enable-Debug: off<br>
> Enable-GProf: off<br>
> Enable-Memtrace: off<br>
> Enable-Sun-STREAMS: on<br>
> Enable-Sun-Door: on<br>
> Enable-IPv6: on<br>
> Enable-Spoof-Source: on<br>
> Enable-TCP-Wrapper: off<br>
> Enable-SSL: on<br>
> Enable-SQL: off<br>
> Enable-Linux-Caps: off<br>
> Enable-Pcre: on<br>
><br>
> The following options are set for the OS:<br>
><br>
> # ndd /dev/udp udp_max_buf<br>
> 1073741824<br>
> # ndd /dev/udp udp_recv_hiwat<br>
> 65536<br>
><br>
> Some options lines from the config based on what I've seen:<br>
><br>
> * note the TCP stuff can be safely ignored, it's legacy from some<br>
> testing but isn't currently seeing traffic<br>
> * all 3 udp sources set with so_rcvbuf(67108864) (64 MB)<br>
><br>
> options { # things I've changed/tweaked<br>
> flush_lines(1000);<br>
> flush_timeout(20);<br>
> log_fifo_size (67108864);<br>
> log_msg_size(8192);<br>
> chain_hostnames(yes);<br>
> # end my changes<br>
> <snip><br>
> };<br>
><br>
> So I'm totally stumped. I can set the buffers with so_rcvbuf() to 1<br>
GB,<br>
> it still doesn't matter, they eventually fill up and I start losing<br>
> packets. I'm hoping that someone can point me to some tweaks I can do<br>
> to get the numbers of drops down or eliminated. Is it unreasonable to<br>
> expect to be able to process this many messages per second via UDP?<br>
> Maybe that's the problem. I might experiment some with default syslog<br>
> to see if it can write this many messages without drops...this doesn't<br>
> seem like an insane amount of traffic. But perhaps my expectations<br>
are<br>
> unrealistic, that's what I'm hoping someone can tell me.<br>
><br>
> Regards,<br>
><br>
> --Mike<br>
<br>
<br>
><br>
________________________________________________________________________<br>
______<br>
> Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
> Documentation:<br>
<a href="http://www.balabit.com/support/documentation/?product=syslog-ng" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
> FAQ: <a href="http://www.campin.net/syslog-ng/faq.html" target="_blank">http://www.campin.net/syslog-ng/faq.html</a><br>
><br>
<br>
________________________________________________________________________<br>
______<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation:<br>
<a href="http://www.balabit.com/support/documentation/?product=syslog-ng" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.campin.net/syslog-ng/faq.html" target="_blank">http://www.campin.net/syslog-ng/faq.html</a><br>
<br>
______________________________________________________________________________<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.campin.net/syslog-ng/faq.html" target="_blank">http://www.campin.net/syslog-ng/faq.html</a><br>
<br>
</div></div></blockquote></div><br></div>