[syslog-ng] Solaris 10 UDP overflows, message drops

Sat Apr 30 14:47:35 CEST 2011

Hi,

On Tue, 2011-04-26 at 12:05 -0400, Mishou Michael wrote:
> For those following this thread, I have applied the "thundering herd"
> UDP patch and experienced no change in the drops experienced by
> syslog-ng 3.1.2.  Sorry I took so long to respond, the patching was a
> much more time-involved process than I thought it would be.
> 
> At this point, based on Michael Hocke's response, I'm thinking that
> perhaps there is just too much UDP traffic for single-threaded syslog-ng
> to deal with in light of what filtering and parsing it does up front
> (for macro usage). 
> 
> I'm going to experiment with syslog-ng and the loggen tool to find a
> point at which a single syslog-ng instance starts dropping inbound UDP
> traffic with a simple configuration writing to disk.  Once I have that
> number, I have a few options:
> 
> 1.  Experiment with syslog-ng 3.3 and the new threaded code to see if I
> have performance gains.  I'm hesitant to push Alpha code in production,
> if anyone has any experience with 3.3 in semi-production environment
> running consistently I'd love to hear it.

I think the most difficult part of compiling syslog-ng for Solaris is
ivykis, the new I/O backend library that we've started using for
threading (it supports epoll, /dev/poll, kqueue etc).

The ivykis version that we use is available on git.balabit.hu, but you
need a complete toolchain (autoconf, automake, libtool, gcc, gmake) to
compile it.

> 2.  So I don't have to change the configuration on a lot of clients, use
> PF to rewrite incoming UDP messages from specific, busy clients to other
> syslog-ng listeners, configured exactly as my main instance (which will
> handle all the non-insanely-busy clients).  I could run multiple
> listeners in this manner, and not need threading to take advantage of
> multiple processors, though obviously each process would still be
> limited to the magic number determined above.  I have 10 or so really
> busy clients, so this is one solution I'm leaning towards if syslog-ng
> 3.1.2 can handle just one of them.

This could work.

> 
> 3.  Give up on syslog-ng until 3.3, or move to some other solution.  Not
> sure what I could do here, rsyslog is the other major contender I guess,
> not sure what gains I would get.  Could also do native syslog server and
> post-process to different buckets/relay which is what we mainly use
> syslog-ng for.
> 
> 4.  Get a faster box (not likely to happen).
> 
> If anyone has any thoughts on any of the above I'd love to hear them.
> Also, if this is unique to Solaris SPARC systems (similarly spec'd x86
> Solaris systems having none of these limitations) I'd love to know that
> as well.  Is there any way anyone knows to figure out at what point the
> SPARC is hitting a ceiling?  The CPU is not pegged, so why would we be
> experiencing CPU-based drops?  Maybe the code is not efficient for how
> SPARC does things, or how some syscall is implemented on Solaris?

Yes, I think this is the root cause of the problem.

-- 
Bazsi