[syslog-ng] Syslog-ng losing messages on solaris 10

AndyH at nominet.org.uk AndyH at nominet.org.uk
Thu Feb 21 16:32:00 CET 2008



syslog-ng-bounces at lists.balabit.hu wrote on 21/02/2008 15:19:15:

>
> On Wed, 2008-02-20 at 18:30 +0100, Balazs Scheidler wrote:
> > On Wed, 2008-02-20 at 14:32 +0000, AndyH at nominet.org.uk wrote:
> > > When I run the syslogd as supplied with Solaris 10 then all messages
get
> > > logged, but when I use syslog-ng then it loses messages.  On a Sun
V210 I
> > > see these messages
> > >
> > > message overflow on /dev/log minor #6 -- is syslogd(1M) running?
> > > message overflow on /dev/log minor #6 -- is syslogd(1M) running?
> > > message overflow on /dev/log minor #6 -- is syslogd(1M) running?
> ...
>
> > diff --git a/src/afstreams.c b/src/afstreams.c
> > index 009b074..d0a76f3 100644
> > --- a/src/afstreams.c
> > +++ b/src/afstreams.c
> > @@ -134,7 +134,7 @@ afstreams_sd_init(LogPipe *s, GlobalConfig
> *cfg, PersistentConfig *persist)
> >            close(fd);
> >            return FALSE;
> >          }
> > -      self->reader = log_reader_new(streams_read_new(fd),
> LR_LOCAL | LR_NOMREAD | LR_PKTTERM, s, &self->reader_options);
> > +      self->reader = log_reader_new(streams_read_new(fd),
> LR_LOCAL | LR_PKTTERM, s, &self->reader_options);
> >        log_pipe_append(self->reader, s);
> >
> >        if (self->door_filename)
> >
> > This will cause the log-fetch-limit() option to become effective,
> thus several messages
> > are going to be fetched for every iteration, this can easily
> multiply performance.
> >
> > Please also check if the local messages get mangled in any way, I
> seriously doubt
> > that would happen, but messing with message transports always
> carries some risk.
> >
>
> Can you please send feedback on this patch? Thanks.

Sorry for the delay - I've been doing some more testing.  With the patch we
are still losing messages and getting the overflow messages on the console.

Solaris syslogd logged 250k messages without missing any, but syslog-ng
loses lots of messages - 40-50% when we are hitting it with 7000
messages/sec.

At one point we saw the whole system lock up for 7 minutes.  No-one could
ssh to the server but those on it could carry on.  It seemed that
everything was waiting on syslog-ng.  I ran truss on it and nothing
appeared in the output for ages, then all of a sudden it started working
again.  Nothing was logged for these 7 minutes

Truss output:

-bash-3.00$ sudo truss -failed -p 1418
Base time stamp:  1203503996.5724  [ Wed Feb 20 10:39:56 GMT 2008 ]
1418/1:         psargs: /opt/syslog-ng/sbin/syslog-ng
1418/1:         98890.2339      getmsg(3, 0xFFFFFFFF7FFFF630,
0xFFFFFFFF7FFFF620, 0xFFFFFFFF7FFFF5FC) = 0
1418/1:         98890.2349      time()
= 1203602886
1418/1:         98890.2369      time()
= 1203602886
1418/1:         98890.2394      time()
= 1203602886
1418/1:         98890.2478      time()
= 1203602886
1418/1:         98890.2480      time()
= 1203602886
1418/1:         98890.2481      time()
= 1203602886
1418/1:         98890.2482      pollsys(0x10012C0A0, 3, 0xFFFFFFFF7FFFF820,
0x00000000) = 3
1418/1:         98890.2484      write(6, " F e b   2 1   1 4 : 0 0".., 147)
= 147
1418/1:         98890.2487      write(8, " F e b   2 1   1 4 : 0 0".., 147)
= 147
1418/1:         98890.2489      pollsys(0x10012C0A0, 1, 0xFFFFFFFF7FFFF820,
0x00000000) = 1
1418/1:         98890.2490      getmsg(3, 0xFFFFFFFF7FFFF630,
0xFFFFFFFF7FFFF620, 0xFFFFFFFF7FFFF5FC) = 0
1418/1:         98890.2491      time()
= 1203602886
1418/1:         98890.2492      time()
= 1203602886
1418/1:         98890.2493      time()
= 1203602886
1418/1:         98890.2494      time()
= 1203602886
1418/1:         98890.2494      time()
= 1203602886
1418/1:         98890.2495      time()
= 1203602886
1418/1:         98890.2496      pollsys(0x10012C0A0, 3, 0xFFFFFFFF7FFFF820,
0x00000000) = 3
1418/1:         98890.2497      write(6, " F e b   2 1   1 4 : 0 0".., 163)
= 163
1418/1:         98890.2499      write(8, " F e b   2 1   1 4 : 0 0".., 163)
= 163
1418/1:         98890.2501      pollsys(0x10012C0A0, 1, 0xFFFFFFFF7FFFF820,
0x00000000) = 1

Regards
Andy Holdaway
System Administrator
Nominet UK


More information about the syslog-ng mailing list