[syslog-ng] Solaris 10 UDP overflows, message drops

Fri Apr 15 20:01:50 CEST 2011

Matthew,

Thanks for the suggestion.  I'm not using so_sndbuf anywhere in this
configuration, just recieving and writing directly to disk.  As for
so_rcvbuf, I've already tried that per the initial message, no dice.
Even if I run a so_rcvbuf size that is 10 times the recommended value in
the configuration note you linked to, it still fills up and then
drops/udpInOverflows start to occur at the rate of about 5k/sec.

Is there something else I'm missing in the config perhaps?  The setting
of so_rcvbuf to a 64 MB buffer only delays the problem for a few seconds
until the buffer again fills.  If I set it to 1 GB (tried this, have a
ton of RAM to work with) it delays the problem for about 10 minutes,
then the drops start.  It seems as if the buffer is not being emptied
fast enough, but the CPU is by no means pegged by syslog-ng.

I left out the resources I have to work with on this system, and how
bad/good things are with syslog-ng running (and dropping), I'll include
those now.  As you can see, it's an older server, but it has a ton of
RAM and the CPUs should have enough pop for this I think.

# uname -a
SunOS ms00310 5.10 Generic_127111-10 sun4u sparc SUNW,Sun-Fire-V490
Solaris
# psrinfo -v | grep MHz
  The sparcv9 processor operates at 1350 MHz,
  The sparcv9 processor operates at 1350 MHz,
  The sparcv9 processor operates at 1350 MHz,
  The sparcv9 processor operates at 1350 MHz,
  The sparcv9 processor operates at 1350 MHz,
  The sparcv9 processor operates at 1350 MHz,
  The sparcv9 processor operates at 1350 MHz,
  The sparcv9 processor operates at 1350 MHz,
# swap -s
total: 4042128k bytes allocated + 967184k reserved = 5009312k used,
48662184k available
# ps -e -o pcpu -o pid -o user -o args | grep syslog
 0.0    70     root vxconfigd -x syslog -m boot
 0.0  6110     root grep syslog
 7.7 22802     root /usr/local/sbin/syslog-ng -f /etc/crap_config.txt -p
/var/run/syslog-ng
 0.0 22801     root /usr/local/sbin/syslog-ng -f /etc/crap_config.txt -p
/var/run/syslog-ng
# top -b -n 5
last pid:  6355;  load avg:  1.36,  1.34,  1.34;       up 58+23:59:11
17:37:27
94 processes: 91 sleeping, 3 on cpu
CPU states: 82.1% idle,  7.0% user, 10.9% kernel,  0.0% iowait,  0.0%
swap
Memory: 32G phys mem, 16G free mem, 32G total swap, 32G free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 22802 root       2  50    0 3067M 3063M cpu/2   79:50  7.81% syslog-ng
 29459 root      15  40    0  193M  166M cpu/1  150.5H  4.49% issCSF
  6352 root       1  55    0 3376K 2032K cpu/19   0:00  0.20% top
  4229 root      82  59    0  327M  324M sleep  661:05  0.17% java
  2695 root       6  59    0 8000K 2984K sleep  802:59  0.11% rmserver

I'm just not sure what to do next to troubleshoot.  I'm hoping someone
here can point me in the right direction, or at least confirm that they
are running syslog-ng in a similar configuration without drops so I know
that it's at least possible?

Regards,

--Mike

-----Original Message-----
From: syslog-ng-bounces at lists.balabit.hu
[mailto:syslog-ng-bounces at lists.balabit.hu] On Behalf Of Matthew Hall
Sent: Friday, April 15, 2011 12:12 PM
To: Syslog-ng users' and developers' mailing list
Subject: Re: [syslog-ng] Solaris 10 UDP overflows, message drops

Probably you need to adjust so_sndbuf and so_rcvbuf:

http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-v3.2-
guide-admin-en.html/index.html-single.html#reference_source_tcpudp

That should make it run better.

Matthew.

On Fri, Apr 15, 2011 at 10:52:59AM -0400, Mishou Michael wrote:
> All,
> 
> I've done a lot of reading, and I can't figure out what I can do to
this
> config in order to fix the UDP drops due to udpInOverflows on netstat
> -s.  Here are some statistics relating to the amount of traffic we
> receive via syslog-ng, it's pretty busy but in reading I'm finding
that
> some folks are doing much more.  These stats are based on a ~30 second
> window of traffic during peak times, but variance due to time is not
so
> much in our environment.  I used tcpdump with a bpf to capture only
> inbound udp/514, so this is what the interface is seeing in the way of
> syslog.
> 
> Elapsed:		00:00:34
> Packets:		200000
> Avg. packets/sec:	5836.546
> Avg. packet size:	303.182 bytes
> Bytes:		60636477
> Avg. bytes/sec:	1769537.884
> Avg. MBit/sec:	14.156
> 
> So, about 6k messages per second.  Here are the drop numbers over a
time
> sample (done right after a process restart, you can see the buffer
takes
> a moment to fill up [64 MB so_rcvbuf]):
> 
> # while true; do echo -en "$(date) :: "; netstat -s | grep
> udpInOverflows | head -n 1 | sed 's|.*=||'; sleep 10; done
> Fri Apr 15 14:12:46 GMT 2011 :: 472517477
> Fri Apr 15 14:12:56 GMT 2011 :: 472517477
> Fri Apr 15 14:13:06 GMT 2011 :: 472517477
> Fri Apr 15 14:13:16 GMT 2011 :: 472517477
> Fri Apr 15 14:13:26 GMT 2011 :: 472543152
> Fri Apr 15 14:13:36 GMT 2011 :: 472592800
> Fri Apr 15 14:13:46 GMT 2011 :: 472638848
> Fri Apr 15 14:13:56 GMT 2011 :: 472684407
> 
> So that's about 5k overflows a second, which jives with our
> calculations, suggesting we're getting only ~10% of our messages
logged
> to disk.
> 
> I inherited a config with _very_ many filter statements, but have
> decided to cut all that out to see if my performance problems in the
way
> of udp drops continue (they do).  I've attached a sanitized config to
> this message, all the stuff here concerns this config running (even
> though I thought eliminating the filters would really help, it
didn't).
> 
> We're running Solaris 10 SPARC.  The syslog-ng version is:
> 
> # /usr/local/sbin/syslog-ng -V
> syslog-ng 3.1.2
> Installer-Version: 3.1.2
> Revision:
>
ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainli
> ne--3.1#master#8bf13c304b6ab5fc1a372b49d55c78370efe14ca
> Compile-Date: Oct 25 2010 23:56:18
> Enable-Threads: off
> Enable-Debug: off
> Enable-GProf: off
> Enable-Memtrace: off
> Enable-Sun-STREAMS: on
> Enable-Sun-Door: on
> Enable-IPv6: on
> Enable-Spoof-Source: on
> Enable-TCP-Wrapper: off
> Enable-SSL: on
> Enable-SQL: off
> Enable-Linux-Caps: off
> Enable-Pcre: on
> 
> The following options are set for the OS:
> 
> # ndd /dev/udp udp_max_buf
> 1073741824
> # ndd /dev/udp udp_recv_hiwat
> 65536
> 
> Some options lines from the config based on what I've seen:
> 
> * note the TCP stuff can be safely ignored, it's legacy from some
> testing but isn't currently seeing traffic
> * all 3 udp sources set with so_rcvbuf(67108864) (64 MB)
> 
> options { # things I've changed/tweaked
>           flush_lines(1000);
>           flush_timeout(20);
>           log_fifo_size (67108864);
>           log_msg_size(8192);
>           chain_hostnames(yes);
>           # end my changes
>         <snip>
>         };
> 
> So I'm totally stumped.  I can set the buffers with so_rcvbuf() to 1
GB,
> it still doesn't matter, they eventually fill up and I start losing
> packets.  I'm hoping that someone can point me to some tweaks I can do
> to get the numbers of drops down or eliminated.  Is it unreasonable to
> expect to be able to process this many messages per second via UDP?
> Maybe that's the problem.  I might experiment some with default syslog
> to see if it can write this many messages without drops...this doesn't
> seem like an insane amount of traffic.  But perhaps my expectations
are
> unrealistic, that's what I'm hoping someone can tell me.
> 
> Regards,
> 
> --Mike

>
________________________________________________________________________
______
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
> 

________________________________________________________________________
______
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation:
http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.campin.net/syslog-ng/faq.html