[syslog-ng] Syslog-ng losing messages on solaris 10

Balazs Scheidler bazsi at balabit.hu
Sun Feb 24 21:00:05 CET 2008


On Thu, 2008-02-21 at 16:56 +0000, AndyH at nominet.org.uk wrote:
> 
> syslog-ng-bounces at lists.balabit.hu wrote on 21/02/2008 16:16:52:
> 
> > > At one point we saw the whole system lock up for 7 minutes.  No-one
> could
> > > ssh to the server but those on it could carry on.  It seemed that
> > > everything was waiting on syslog-ng.  I ran truss on it and nothing
> > > appeared in the output for ages, then all of a sudden it started
> working
> > > again.  Nothing was logged for these 7 minutes
> >
> > > Truss output:
> > >
> > > -bash-3.00$ sudo truss -failed -p 1418
> > > Base time stamp:  1203503996.5724  [ Wed Feb 20 10:39:56 GMT 2008 ]
> > > 1418/1:         psargs: /opt/syslog-ng/sbin/syslog-ng
> > > 1418/1:         98890.2339      getmsg(3, 0xFFFFFFFF7FFFF630,
> > > 0xFFFFFFFF7FFFF620, 0xFFFFFFFF7FFFF5FC) = 0
> >
> > I could successfully reproduce the message loss on my solaris 9 system,
> > but I need some more time to investigate the issue.
> 
> Good news - you can reproduce the problem.  I'll let you investigate.

During my investigations I found the followings:

1) Solaris /dev/log is not flow controlled in any way, if the buffer in
the STREAMS device-head overflows, the message is gone for good and a
warning is printed on the console, the same that you mentioned in one of
your earlier mails.

2) The door support of syslog-ng was broken, because the door device was
opened before going into the background, and doors are not inherited
through forks. You can work this around by running syslog-ng with the
--foreground command line option, and sending it to the background
explicitly with your shell. Alternatively please apply the patch below.

3) If the door support works, clients generating messages slow somewhat
down, because they invoke a door-call. This slow down then causes less
messages to be dropped, but still some of it are dropped.

4) As syslog-ng is not threaded, it polls its inputs and writes its
outputs from the same thread. E.g. while the output is being written,
inputs are starved somewhat. This causes latency on the input side and
as the STREAMS buffer is small, this latency causes more messages to be
dropped, than syslogd.

5) syslogd does not lose messages if there's a single client that
generates messages. If I start two instances of my simple test program,
it also starts losing messages. The reason that it does not drop
messages that it has a dedicated thread for polling inputs, thus the
latency is smaller. On the other hand it puts all incoming messages to a
queue which is not limited in size. Thus if the file write thread is
slow, syslogd may consume unlimited amount of memory.

6) You can decrease the latency between getmsg() calls in syslog-ng, by
   * using a higher log-fetch-limit,
   * using throttle on the output (requires syslog-ng 2.1.x),
   * ensuring that your FIFO (log-fifo-size) is appropriately sized,

  The effects are as follows:
   * log-fetch-limit increases the number of getmsg() calls per poll
iterations
   * throttle limits the number of messages written out per second,
effectively limiting the time spent writing the output,


So that's about it. The message loss is inherently caused by the fact
that /dev/log is not flow controlled, and even worse the buffer size
cannot be changed.

All options I have left is to force clients to wait somewhat when they
invoke the syslog door, this'd be poor man's flow-control though.

This patch fixes the door initialization problem:

http://git.balabit.hu/?p=bazsi/syslog-ng-2.1.git;a=commit;h=5b65710e9781203d005e6f4bcb3f33bad4f827b6

Once you apply this, please make sure that you use the proper path to
the syslog door, which is /etc/.syslog_door for Solaris 8 and below,
and /var/run/syslog_door for Solaris 9 and later.


> 
> One other thing worth mentioning are these 2 warnings during compilation
> which may be of relevance:
> afstreams.c: In function `afstreams_sd_init':
> afstreams.c:160: warning: passing arg 1 of `door_create' from incompatible
> pointer type

This is a real warning, however does not cause a problem. I'll fix that,
thanks for reporting this.

-- 
Bazsi



More information about the syslog-ng mailing list