[syslog-ng] Syslog-ng losing messages on solaris 10
AndyH at nominet.org.uk
AndyH at nominet.org.uk
Thu Feb 21 16:32:00 CET 2008
syslog-ng-bounces at lists.balabit.hu wrote on 21/02/2008 15:19:15:
>
> On Wed, 2008-02-20 at 18:30 +0100, Balazs Scheidler wrote:
> > On Wed, 2008-02-20 at 14:32 +0000, AndyH at nominet.org.uk wrote:
> > > When I run the syslogd as supplied with Solaris 10 then all messages
get
> > > logged, but when I use syslog-ng then it loses messages. On a Sun
V210 I
> > > see these messages
> > >
> > > message overflow on /dev/log minor #6 -- is syslogd(1M) running?
> > > message overflow on /dev/log minor #6 -- is syslogd(1M) running?
> > > message overflow on /dev/log minor #6 -- is syslogd(1M) running?
> ...
>
> > diff --git a/src/afstreams.c b/src/afstreams.c
> > index 009b074..d0a76f3 100644
> > --- a/src/afstreams.c
> > +++ b/src/afstreams.c
> > @@ -134,7 +134,7 @@ afstreams_sd_init(LogPipe *s, GlobalConfig
> *cfg, PersistentConfig *persist)
> > close(fd);
> > return FALSE;
> > }
> > - self->reader = log_reader_new(streams_read_new(fd),
> LR_LOCAL | LR_NOMREAD | LR_PKTTERM, s, &self->reader_options);
> > + self->reader = log_reader_new(streams_read_new(fd),
> LR_LOCAL | LR_PKTTERM, s, &self->reader_options);
> > log_pipe_append(self->reader, s);
> >
> > if (self->door_filename)
> >
> > This will cause the log-fetch-limit() option to become effective,
> thus several messages
> > are going to be fetched for every iteration, this can easily
> multiply performance.
> >
> > Please also check if the local messages get mangled in any way, I
> seriously doubt
> > that would happen, but messing with message transports always
> carries some risk.
> >
>
> Can you please send feedback on this patch? Thanks.
Sorry for the delay - I've been doing some more testing. With the patch we
are still losing messages and getting the overflow messages on the console.
Solaris syslogd logged 250k messages without missing any, but syslog-ng
loses lots of messages - 40-50% when we are hitting it with 7000
messages/sec.
At one point we saw the whole system lock up for 7 minutes. No-one could
ssh to the server but those on it could carry on. It seemed that
everything was waiting on syslog-ng. I ran truss on it and nothing
appeared in the output for ages, then all of a sudden it started working
again. Nothing was logged for these 7 minutes
Truss output:
-bash-3.00$ sudo truss -failed -p 1418
Base time stamp: 1203503996.5724 [ Wed Feb 20 10:39:56 GMT 2008 ]
1418/1: psargs: /opt/syslog-ng/sbin/syslog-ng
1418/1: 98890.2339 getmsg(3, 0xFFFFFFFF7FFFF630,
0xFFFFFFFF7FFFF620, 0xFFFFFFFF7FFFF5FC) = 0
1418/1: 98890.2349 time()
= 1203602886
1418/1: 98890.2369 time()
= 1203602886
1418/1: 98890.2394 time()
= 1203602886
1418/1: 98890.2478 time()
= 1203602886
1418/1: 98890.2480 time()
= 1203602886
1418/1: 98890.2481 time()
= 1203602886
1418/1: 98890.2482 pollsys(0x10012C0A0, 3, 0xFFFFFFFF7FFFF820,
0x00000000) = 3
1418/1: 98890.2484 write(6, " F e b 2 1 1 4 : 0 0".., 147)
= 147
1418/1: 98890.2487 write(8, " F e b 2 1 1 4 : 0 0".., 147)
= 147
1418/1: 98890.2489 pollsys(0x10012C0A0, 1, 0xFFFFFFFF7FFFF820,
0x00000000) = 1
1418/1: 98890.2490 getmsg(3, 0xFFFFFFFF7FFFF630,
0xFFFFFFFF7FFFF620, 0xFFFFFFFF7FFFF5FC) = 0
1418/1: 98890.2491 time()
= 1203602886
1418/1: 98890.2492 time()
= 1203602886
1418/1: 98890.2493 time()
= 1203602886
1418/1: 98890.2494 time()
= 1203602886
1418/1: 98890.2494 time()
= 1203602886
1418/1: 98890.2495 time()
= 1203602886
1418/1: 98890.2496 pollsys(0x10012C0A0, 3, 0xFFFFFFFF7FFFF820,
0x00000000) = 3
1418/1: 98890.2497 write(6, " F e b 2 1 1 4 : 0 0".., 163)
= 163
1418/1: 98890.2499 write(8, " F e b 2 1 1 4 : 0 0".., 163)
= 163
1418/1: 98890.2501 pollsys(0x10012C0A0, 1, 0xFFFFFFFF7FFFF820,
0x00000000) = 1
Regards
Andy Holdaway
System Administrator
Nominet UK
More information about the syslog-ng
mailing list