[syslog-ng]losing messages using syslog-ng-1.4.7 & libol-0.2.20

Jeffrey W. Baker jwbaker@acm.org
Thu, 26 Oct 2000 07:46:48 -0700 (PDT)


On Thu, 26 Oct 2000 matthew.copeland@honeywell.com wrote:

> > > 
> > > Sure.  I didn't reset log_fifo_size, since I was under the understanding
> > > that would only be useful on the client side.  
> > 
> > try to set it to a larger value, the buffering in syslog-ng doesn't really
> > differentiate between files and destination connections. so set
> > log_fifo_size to 310*20 = 6200.
> 
> Okay.  I gave that a try, and it seems to help some.  Meaning that if I
> increase it enough, I don't lose more than say about 25 - 50 messages at
> most.  It's odd because I can lose anywhere between 1 - 50 messages or so,
> and never get all of them under this test.  (300 thread clients, 4
> messages a second, for 30 seconds == 36000 messages)  I would think that
> at least once I would get all of the messages for all the times that I am
> getting only 2 or 3 lost.  I can get all of the messages with a less
> number of clients and more messages per second.  Any ideas?

You're almost certainly overflowing the internal queue in either the
server or the clients or both.  Try logging the messages on the clients to
a file as well as the network, and see if the local syslog-ngs are
dropping messages before the server has any chance.

You can put some printf statements into the libol code to trace what is
happening with each log entry and the output queue.  I did this to mine
and I can see in the output that syslog-ng sometimes (often) simply throws
messages away.  The interesting parts are in pkt_buffer.c and queue.c in
libol:

        if (self->queue_size == self->queue_max) {
                /* fifo full */           <== oops
                ol_string_free(string);   <== this tosses your message
                return ST_FAIL | ST_OK;   <== return code is ignored
        }

I've raised this issue on this list before, but have been ignored.  
Regardless of how high your fifo size is, syslog-ng will lose messages if
the sources generate messages faster than the destintation can consume
them.  Raising the fifo size only masks transients, but does not help in
the steady state.

The symptom is easily seen if you send very small message, such as a
three-digit sequence and a newline.  This puts as much stress as possible
on syslog-ng.  If you are suffering from this problem, you will notice in
your logs that large blocks of messages are missing.  You can generate
these messages very quickly with a perl script writing to a named pipe.

-jwb