[syslog-ng] Syslog-ng losing messages on solaris 10

AndyH at nominet.org.uk AndyH at nominet.org.uk
Fri Feb 29 12:24:44 CET 2008



syslog-ng-bounces at lists.balabit.hu wrote on 24/02/2008 20:00:05:

>
> On Thu, 2008-02-21 at 16:56 +0000, AndyH at nominet.org.uk wrote:
> >
> > syslog-ng-bounces at lists.balabit.hu wrote on 21/02/2008 16:16:52:
> >
> > > > At one point we saw the whole system lock up for 7 minutes.  No-one
> > could
> > > > ssh to the server but those on it could carry on.  It seemed that
> > > > everything was waiting on syslog-ng.  I ran truss on it and nothing
> > > > appeared in the output for ages, then all of a sudden it started
> > working
> > > > again.  Nothing was logged for these 7 minutes
> > >
> > > > Truss output:
> > > >
> > > > -bash-3.00$ sudo truss -failed -p 1418
> > > > Base time stamp:  1203503996.5724  [ Wed Feb 20 10:39:56 GMT 2008 ]
> > > > 1418/1:         psargs: /opt/syslog-ng/sbin/syslog-ng
> > > > 1418/1:         98890.2339      getmsg(3, 0xFFFFFFFF7FFFF630,
> > > > 0xFFFFFFFF7FFFF620, 0xFFFFFFFF7FFFF5FC) = 0
> > >
> > > I could successfully reproduce the message loss on my solaris 9
system,
> > > but I need some more time to investigate the issue.
> >
> > Good news - you can reproduce the problem.  I'll let you investigate.
>
> During my investigations I found the followings:
>
> 1) Solaris /dev/log is not flow controlled in any way, if the buffer in
> the STREAMS device-head overflows, the message is gone for good and a
> warning is printed on the console, the same that you mentioned in one of
> your earlier mails.
>
> 2) The door support of syslog-ng was broken, because the door device was
> opened before going into the background, and doors are not inherited
> through forks. You can work this around by running syslog-ng with the
> --foreground command line option, and sending it to the background
> explicitly with your shell. Alternatively please apply the patch below.
>
> 3) If the door support works, clients generating messages slow somewhat
> down, because they invoke a door-call. This slow down then causes less
> messages to be dropped, but still some of it are dropped.
>
> 4) As syslog-ng is not threaded, it polls its inputs and writes its
> outputs from the same thread. E.g. while the output is being written,
> inputs are starved somewhat. This causes latency on the input side and
> as the STREAMS buffer is small, this latency causes more messages to be
> dropped, than syslogd.

Can it be made threaded?  Although it might still lose some messages it
should lose less of them.

> 5) syslogd does not lose messages if there's a single client that
> generates messages. If I start two instances of my simple test program,
> it also starts losing messages. The reason that it does not drop
> messages that it has a dedicated thread for polling inputs, thus the
> latency is smaller. On the other hand it puts all incoming messages to a
> queue which is not limited in size. Thus if the file write thread is
> slow, syslogd may consume unlimited amount of memory.
>
> 6) You can decrease the latency between getmsg() calls in syslog-ng, by
>    * using a higher log-fetch-limit,
>    * using throttle on the output (requires syslog-ng 2.1.x),
>    * ensuring that your FIFO (log-fifo-size) is appropriately sized,
>
>   The effects are as follows:
>    * log-fetch-limit increases the number of getmsg() calls per poll
> iterations
>    * throttle limits the number of messages written out per second,
> effectively limiting the time spent writing the output,
>
>
> So that's about it. The message loss is inherently caused by the fact
> that /dev/log is not flow controlled, and even worse the buffer size
> cannot be changed.
>
> All options I have left is to force clients to wait somewhat when they
> invoke the syslog door, this'd be poor man's flow-control though.
>
> This patch fixes the door initialization problem:
>
> http://git.balabit.hu/?p=bazsi/syslog-ng-2.
> 1.git;a=commit;h=5b65710e9781203d005e6f4bcb3f33bad4f827b6

patch fails to compile:

afstreams.c:195: warning: implicit declaration of function
`register_application_hook'
afstreams.c:195: error: `AH_POST_DAEMONIZED' undeclared (first use in this
function)
afstreams.c:195: error: (Each undeclared identifier is reported only once
afstreams.c:195: error: for each function it appears in.)

Andy


More information about the syslog-ng mailing list