[syslog-ng] Solaris 10 UDP overflows, message drops

Fri Apr 15 20:21:22 CEST 2011

Hi Mike,
I'm heading out of town on a trip, so not enough time to read the whole
thread.
You may or may not have tried some of this, but I had similar issues a while
back and noted it here:
http://nms.gdd.net/index.php/Install_Guide_for_LogZilla_v3.1#UDP_Buffers
<http://nms.gdd.net/index.php/Install_Guide_for_LogZilla_v3.1#UDP_Buffers>Hope
it helps :-)

______________________________________________________________

Clayton Dukes
______________________________________________________________

On Fri, Apr 15, 2011 at 2:01 PM, Mishou Michael <
Michael.Mishou at csirc.irs.gov> wrote:

> Matthew,
>
> Thanks for the suggestion.  I'm not using so_sndbuf anywhere in this
> configuration, just recieving and writing directly to disk.  As for
> so_rcvbuf, I've already tried that per the initial message, no dice.
> Even if I run a so_rcvbuf size that is 10 times the recommended value in
> the configuration note you linked to, it still fills up and then
> drops/udpInOverflows start to occur at the rate of about 5k/sec.
>
> Is there something else I'm missing in the config perhaps?  The setting
> of so_rcvbuf to a 64 MB buffer only delays the problem for a few seconds
> until the buffer again fills.  If I set it to 1 GB (tried this, have a
> ton of RAM to work with) it delays the problem for about 10 minutes,
> then the drops start.  It seems as if the buffer is not being emptied
> fast enough, but the CPU is by no means pegged by syslog-ng.
>
> I left out the resources I have to work with on this system, and how
> bad/good things are with syslog-ng running (and dropping), I'll include
> those now.  As you can see, it's an older server, but it has a ton of
> RAM and the CPUs should have enough pop for this I think.
>
> # uname -a
> SunOS ms00310 5.10 Generic_127111-10 sun4u sparc SUNW,Sun-Fire-V490
> Solaris
> # psrinfo -v | grep MHz
>  The sparcv9 processor operates at 1350 MHz,
>  The sparcv9 processor operates at 1350 MHz,
>  The sparcv9 processor operates at 1350 MHz,
>  The sparcv9 processor operates at 1350 MHz,
>  The sparcv9 processor operates at 1350 MHz,
>  The sparcv9 processor operates at 1350 MHz,
>  The sparcv9 processor operates at 1350 MHz,
>  The sparcv9 processor operates at 1350 MHz,
> # swap -s
> total: 4042128k bytes allocated + 967184k reserved = 5009312k used,
> 48662184k available
> # ps -e -o pcpu -o pid -o user -o args | grep syslog
>  0.0    70     root vxconfigd -x syslog -m boot
>  0.0  6110     root grep syslog
>  7.7 22802     root /usr/local/sbin/syslog-ng -f /etc/crap_config.txt -p
> /var/run/syslog-ng
>  0.0 22801     root /usr/local/sbin/syslog-ng -f /etc/crap_config.txt -p
> /var/run/syslog-ng
> # top -b -n 5
> last pid:  6355;  load avg:  1.36,  1.34,  1.34;       up 58+23:59:11
> 17:37:27
> 94 processes: 91 sleeping, 3 on cpu
> CPU states: 82.1% idle,  7.0% user, 10.9% kernel,  0.0% iowait,  0.0%
> swap
> Memory: 32G phys mem, 16G free mem, 32G total swap, 32G free swap
>
>   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>  22802 root       2  50    0 3067M 3063M cpu/2   79:50  7.81% syslog-ng
>  29459 root      15  40    0  193M  166M cpu/1  150.5H  4.49% issCSF
>  6352 root       1  55    0 3376K 2032K cpu/19   0:00  0.20% top
>  4229 root      82  59    0  327M  324M sleep  661:05  0.17% java
>  2695 root       6  59    0 8000K 2984K sleep  802:59  0.11% rmserver
>
> I'm just not sure what to do next to troubleshoot.  I'm hoping someone
> here can point me in the right direction, or at least confirm that they
> are running syslog-ng in a similar configuration without drops so I know
> that it's at least possible?
>
> Regards,
>
> --Mike
>
> -----Original Message-----
> From: syslog-ng-bounces at lists.balabit.hu
> [mailto:syslog-ng-bounces at lists.balabit.hu] On Behalf Of Matthew Hall
> Sent: Friday, April 15, 2011 12:12 PM
> To: Syslog-ng users' and developers' mailing list
> Subject: Re: [syslog-ng] Solaris 10 UDP overflows, message drops
>
> Probably you need to adjust so_sndbuf and so_rcvbuf:
>
> http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-v3.2-
> guide-admin-en.html/index.html-single.html#reference_source_tcpudp
>
> That should make it run better.
>
> Matthew.
>
> On Fri, Apr 15, 2011 at 10:52:59AM -0400, Mishou Michael wrote:
> > All,
> >
> > I've done a lot of reading, and I can't figure out what I can do to
> this
> > config in order to fix the UDP drops due to udpInOverflows on netstat
> > -s.  Here are some statistics relating to the amount of traffic we
> > receive via syslog-ng, it's pretty busy but in reading I'm finding
> that
> > some folks are doing much more.  These stats are based on a ~30 second
> > window of traffic during peak times, but variance due to time is not
> so
> > much in our environment.  I used tcpdump with a bpf to capture only
> > inbound udp/514, so this is what the interface is seeing in the way of
> > syslog.
> >
> > Elapsed:              00:00:34
> > Packets:              200000
> > Avg. packets/sec:     5836.546
> > Avg. packet size:     303.182 bytes
> > Bytes:                60636477
> > Avg. bytes/sec:       1769537.884
> > Avg. MBit/sec:        14.156
> >
> > So, about 6k messages per second.  Here are the drop numbers over a
> time
> > sample (done right after a process restart, you can see the buffer
> takes
> > a moment to fill up [64 MB so_rcvbuf]):
> >
> > # while true; do echo -en "$(date) :: "; netstat -s | grep
> > udpInOverflows | head -n 1 | sed 's|.*=||'; sleep 10; done
> > Fri Apr 15 14:12:46 GMT 2011 :: 472517477
> > Fri Apr 15 14:12:56 GMT 2011 :: 472517477
> > Fri Apr 15 14:13:06 GMT 2011 :: 472517477
> > Fri Apr 15 14:13:16 GMT 2011 :: 472517477
> > Fri Apr 15 14:13:26 GMT 2011 :: 472543152
> > Fri Apr 15 14:13:36 GMT 2011 :: 472592800
> > Fri Apr 15 14:13:46 GMT 2011 :: 472638848
> > Fri Apr 15 14:13:56 GMT 2011 :: 472684407
> >
> > So that's about 5k overflows a second, which jives with our
> > calculations, suggesting we're getting only ~10% of our messages
> logged
> > to disk.
> >
> > I inherited a config with _very_ many filter statements, but have
> > decided to cut all that out to see if my performance problems in the
> way
> > of udp drops continue (they do).  I've attached a sanitized config to
> > this message, all the stuff here concerns this config running (even
> > though I thought eliminating the filters would really help, it
> didn't).
> >
> > We're running Solaris 10 SPARC.  The syslog-ng version is:
> >
> > # /usr/local/sbin/syslog-ng -V
> > syslog-ng 3.1.2
> > Installer-Version: 3.1.2
> > Revision:
> >
> ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainli
> > ne--3.1#master#8bf13c304b6ab5fc1a372b49d55c78370efe14ca
> > Compile-Date: Oct 25 2010 23:56:18
> > Enable-Threads: off
> > Enable-Debug: off
> > Enable-GProf: off
> > Enable-Memtrace: off
> > Enable-Sun-STREAMS: on
> > Enable-Sun-Door: on
> > Enable-IPv6: on
> > Enable-Spoof-Source: on
> > Enable-TCP-Wrapper: off
> > Enable-SSL: on
> > Enable-SQL: off
> > Enable-Linux-Caps: off
> > Enable-Pcre: on
> >
> > The following options are set for the OS:
> >
> > # ndd /dev/udp udp_max_buf
> > 1073741824
> > # ndd /dev/udp udp_recv_hiwat
> > 65536
> >
> > Some options lines from the config based on what I've seen:
> >
> > * note the TCP stuff can be safely ignored, it's legacy from some
> > testing but isn't currently seeing traffic
> > * all 3 udp sources set with so_rcvbuf(67108864) (64 MB)
> >
> > options { # things I've changed/tweaked
> >           flush_lines(1000);
> >           flush_timeout(20);
> >           log_fifo_size (67108864);
> >           log_msg_size(8192);
> >           chain_hostnames(yes);
> >           # end my changes
> >         <snip>
> >         };
> >
> > So I'm totally stumped.  I can set the buffers with so_rcvbuf() to 1
> GB,
> > it still doesn't matter, they eventually fill up and I start losing
> > packets.  I'm hoping that someone can point me to some tweaks I can do
> > to get the numbers of drops down or eliminated.  Is it unreasonable to
> > expect to be able to process this many messages per second via UDP?
> > Maybe that's the problem.  I might experiment some with default syslog
> > to see if it can write this many messages without drops...this doesn't
> > seem like an insane amount of traffic.  But perhaps my expectations
> are
> > unrealistic, that's what I'm hoping someone can tell me.
> >
> > Regards,
> >
> > --Mike
>
>
> >
> ________________________________________________________________________
> ______
> > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> > Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> > FAQ: http://www.campin.net/syslog-ng/faq.html
> >
>
> ________________________________________________________________________
> ______
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20110415/61c3f1e7/attachment-0001.htm