Hi Mike, I'm heading out of town on a trip, so not enough time to read the whole thread. You may or may not have tried some of this, but I had similar issues a while back and noted it here: http://nms.gdd.net/index.php/Install_Guide_for_LogZilla_v3.1#UDP_Buffers <http://nms.gdd.net/index.php/Install_Guide_for_LogZilla_v3.1#UDP_Buffers>Hope it helps :-) ______________________________________________________________ Clayton Dukes ______________________________________________________________ On Fri, Apr 15, 2011 at 2:01 PM, Mishou Michael < Michael.Mishou@csirc.irs.gov> wrote:
Matthew,
Thanks for the suggestion. I'm not using so_sndbuf anywhere in this configuration, just recieving and writing directly to disk. As for so_rcvbuf, I've already tried that per the initial message, no dice. Even if I run a so_rcvbuf size that is 10 times the recommended value in the configuration note you linked to, it still fills up and then drops/udpInOverflows start to occur at the rate of about 5k/sec.
Is there something else I'm missing in the config perhaps? The setting of so_rcvbuf to a 64 MB buffer only delays the problem for a few seconds until the buffer again fills. If I set it to 1 GB (tried this, have a ton of RAM to work with) it delays the problem for about 10 minutes, then the drops start. It seems as if the buffer is not being emptied fast enough, but the CPU is by no means pegged by syslog-ng.
I left out the resources I have to work with on this system, and how bad/good things are with syslog-ng running (and dropping), I'll include those now. As you can see, it's an older server, but it has a ton of RAM and the CPUs should have enough pop for this I think.
# uname -a SunOS ms00310 5.10 Generic_127111-10 sun4u sparc SUNW,Sun-Fire-V490 Solaris # psrinfo -v | grep MHz The sparcv9 processor operates at 1350 MHz, The sparcv9 processor operates at 1350 MHz, The sparcv9 processor operates at 1350 MHz, The sparcv9 processor operates at 1350 MHz, The sparcv9 processor operates at 1350 MHz, The sparcv9 processor operates at 1350 MHz, The sparcv9 processor operates at 1350 MHz, The sparcv9 processor operates at 1350 MHz, # swap -s total: 4042128k bytes allocated + 967184k reserved = 5009312k used, 48662184k available # ps -e -o pcpu -o pid -o user -o args | grep syslog 0.0 70 root vxconfigd -x syslog -m boot 0.0 6110 root grep syslog 7.7 22802 root /usr/local/sbin/syslog-ng -f /etc/crap_config.txt -p /var/run/syslog-ng 0.0 22801 root /usr/local/sbin/syslog-ng -f /etc/crap_config.txt -p /var/run/syslog-ng # top -b -n 5 last pid: 6355; load avg: 1.36, 1.34, 1.34; up 58+23:59:11 17:37:27 94 processes: 91 sleeping, 3 on cpu CPU states: 82.1% idle, 7.0% user, 10.9% kernel, 0.0% iowait, 0.0% swap Memory: 32G phys mem, 16G free mem, 32G total swap, 32G free swap
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 22802 root 2 50 0 3067M 3063M cpu/2 79:50 7.81% syslog-ng 29459 root 15 40 0 193M 166M cpu/1 150.5H 4.49% issCSF 6352 root 1 55 0 3376K 2032K cpu/19 0:00 0.20% top 4229 root 82 59 0 327M 324M sleep 661:05 0.17% java 2695 root 6 59 0 8000K 2984K sleep 802:59 0.11% rmserver
I'm just not sure what to do next to troubleshoot. I'm hoping someone here can point me in the right direction, or at least confirm that they are running syslog-ng in a similar configuration without drops so I know that it's at least possible?
Regards,
--Mike
-----Original Message----- From: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] On Behalf Of Matthew Hall Sent: Friday, April 15, 2011 12:12 PM To: Syslog-ng users' and developers' mailing list Subject: Re: [syslog-ng] Solaris 10 UDP overflows, message drops
Probably you need to adjust so_sndbuf and so_rcvbuf:
http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-v3.2- guide-admin-en.html/index.html-single.html#reference_source_tcpudp
That should make it run better.
Matthew.
On Fri, Apr 15, 2011 at 10:52:59AM -0400, Mishou Michael wrote:
All,
I've done a lot of reading, and I can't figure out what I can do to this config in order to fix the UDP drops due to udpInOverflows on netstat -s. Here are some statistics relating to the amount of traffic we receive via syslog-ng, it's pretty busy but in reading I'm finding that some folks are doing much more. These stats are based on a ~30 second window of traffic during peak times, but variance due to time is not so much in our environment. I used tcpdump with a bpf to capture only inbound udp/514, so this is what the interface is seeing in the way of syslog.
Elapsed: 00:00:34 Packets: 200000 Avg. packets/sec: 5836.546 Avg. packet size: 303.182 bytes Bytes: 60636477 Avg. bytes/sec: 1769537.884 Avg. MBit/sec: 14.156
So, about 6k messages per second. Here are the drop numbers over a time sample (done right after a process restart, you can see the buffer takes a moment to fill up [64 MB so_rcvbuf]):
# while true; do echo -en "$(date) :: "; netstat -s | grep udpInOverflows | head -n 1 | sed 's|.*=||'; sleep 10; done Fri Apr 15 14:12:46 GMT 2011 :: 472517477 Fri Apr 15 14:12:56 GMT 2011 :: 472517477 Fri Apr 15 14:13:06 GMT 2011 :: 472517477 Fri Apr 15 14:13:16 GMT 2011 :: 472517477 Fri Apr 15 14:13:26 GMT 2011 :: 472543152 Fri Apr 15 14:13:36 GMT 2011 :: 472592800 Fri Apr 15 14:13:46 GMT 2011 :: 472638848 Fri Apr 15 14:13:56 GMT 2011 :: 472684407
So that's about 5k overflows a second, which jives with our calculations, suggesting we're getting only ~10% of our messages logged to disk.
I inherited a config with _very_ many filter statements, but have decided to cut all that out to see if my performance problems in the way of udp drops continue (they do). I've attached a sanitized config to this message, all the stuff here concerns this config running (even though I thought eliminating the filters would really help, it didn't).
We're running Solaris 10 SPARC. The syslog-ng version is:
# /usr/local/sbin/syslog-ng -V syslog-ng 3.1.2 Installer-Version: 3.1.2 Revision:
ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainli
ne--3.1#master#8bf13c304b6ab5fc1a372b49d55c78370efe14ca Compile-Date: Oct 25 2010 23:56:18 Enable-Threads: off Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-Sun-STREAMS: on Enable-Sun-Door: on Enable-IPv6: on Enable-Spoof-Source: on Enable-TCP-Wrapper: off Enable-SSL: on Enable-SQL: off Enable-Linux-Caps: off Enable-Pcre: on
The following options are set for the OS:
# ndd /dev/udp udp_max_buf 1073741824 # ndd /dev/udp udp_recv_hiwat 65536
Some options lines from the config based on what I've seen:
* note the TCP stuff can be safely ignored, it's legacy from some testing but isn't currently seeing traffic * all 3 udp sources set with so_rcvbuf(67108864) (64 MB)
options { # things I've changed/tweaked flush_lines(1000); flush_timeout(20); log_fifo_size (67108864); log_msg_size(8192); chain_hostnames(yes); # end my changes <snip> };
So I'm totally stumped. I can set the buffers with so_rcvbuf() to 1 GB, it still doesn't matter, they eventually fill up and I start losing packets. I'm hoping that someone can point me to some tweaks I can do to get the numbers of drops down or eliminated. Is it unreasonable to expect to be able to process this many messages per second via UDP? Maybe that's the problem. I might experiment some with default syslog to see if it can write this many messages without drops...this doesn't seem like an insane amount of traffic. But perhaps my expectations are unrealistic, that's what I'm hoping someone can tell me.
Regards,
--Mike
________________________________________________________________________ ______
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
________________________________________________________________________ ______ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html