syslog-ng-bounces@lists.balabit.hu wrote on 21/02/2008 15:19:15:
On Wed, 2008-02-20 at 18:30 +0100, Balazs Scheidler wrote:
On Wed, 2008-02-20 at 14:32 +0000, AndyH@nominet.org.uk wrote:
When I run the syslogd as supplied with Solaris 10 then all messages
get
logged, but when I use syslog-ng then it loses messages. On a Sun V210 I see these messages
message overflow on /dev/log minor #6 -- is syslogd(1M) running? message overflow on /dev/log minor #6 -- is syslogd(1M) running? message overflow on /dev/log minor #6 -- is syslogd(1M) running? ...
diff --git a/src/afstreams.c b/src/afstreams.c index 009b074..d0a76f3 100644 --- a/src/afstreams.c +++ b/src/afstreams.c @@ -134,7 +134,7 @@ afstreams_sd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist) close(fd); return FALSE; } - self->reader = log_reader_new(streams_read_new(fd), LR_LOCAL | LR_NOMREAD | LR_PKTTERM, s, &self->reader_options); + self->reader = log_reader_new(streams_read_new(fd), LR_LOCAL | LR_PKTTERM, s, &self->reader_options); log_pipe_append(self->reader, s);
if (self->door_filename)
This will cause the log-fetch-limit() option to become effective, thus several messages are going to be fetched for every iteration, this can easily multiply performance.
Please also check if the local messages get mangled in any way, I seriously doubt that would happen, but messing with message transports always carries some risk.
Can you please send feedback on this patch? Thanks.
Sorry for the delay - I've been doing some more testing. With the patch we are still losing messages and getting the overflow messages on the console. Solaris syslogd logged 250k messages without missing any, but syslog-ng loses lots of messages - 40-50% when we are hitting it with 7000 messages/sec. At one point we saw the whole system lock up for 7 minutes. No-one could ssh to the server but those on it could carry on. It seemed that everything was waiting on syslog-ng. I ran truss on it and nothing appeared in the output for ages, then all of a sudden it started working again. Nothing was logged for these 7 minutes Truss output: -bash-3.00$ sudo truss -failed -p 1418 Base time stamp: 1203503996.5724 [ Wed Feb 20 10:39:56 GMT 2008 ] 1418/1: psargs: /opt/syslog-ng/sbin/syslog-ng 1418/1: 98890.2339 getmsg(3, 0xFFFFFFFF7FFFF630, 0xFFFFFFFF7FFFF620, 0xFFFFFFFF7FFFF5FC) = 0 1418/1: 98890.2349 time() = 1203602886 1418/1: 98890.2369 time() = 1203602886 1418/1: 98890.2394 time() = 1203602886 1418/1: 98890.2478 time() = 1203602886 1418/1: 98890.2480 time() = 1203602886 1418/1: 98890.2481 time() = 1203602886 1418/1: 98890.2482 pollsys(0x10012C0A0, 3, 0xFFFFFFFF7FFFF820, 0x00000000) = 3 1418/1: 98890.2484 write(6, " F e b 2 1 1 4 : 0 0".., 147) = 147 1418/1: 98890.2487 write(8, " F e b 2 1 1 4 : 0 0".., 147) = 147 1418/1: 98890.2489 pollsys(0x10012C0A0, 1, 0xFFFFFFFF7FFFF820, 0x00000000) = 1 1418/1: 98890.2490 getmsg(3, 0xFFFFFFFF7FFFF630, 0xFFFFFFFF7FFFF620, 0xFFFFFFFF7FFFF5FC) = 0 1418/1: 98890.2491 time() = 1203602886 1418/1: 98890.2492 time() = 1203602886 1418/1: 98890.2493 time() = 1203602886 1418/1: 98890.2494 time() = 1203602886 1418/1: 98890.2494 time() = 1203602886 1418/1: 98890.2495 time() = 1203602886 1418/1: 98890.2496 pollsys(0x10012C0A0, 3, 0xFFFFFFFF7FFFF820, 0x00000000) = 3 1418/1: 98890.2497 write(6, " F e b 2 1 1 4 : 0 0".., 163) = 163 1418/1: 98890.2499 write(8, " F e b 2 1 1 4 : 0 0".., 163) = 163 1418/1: 98890.2501 pollsys(0x10012C0A0, 1, 0xFFFFFFFF7FFFF820, 0x00000000) = 1 Regards Andy Holdaway System Administrator Nominet UK