syslog-ng-bounces@lists.balabit.hu wrote on 21/02/2008 16:16:52:
At one point we saw the whole system lock up for 7 minutes. No-one could ssh to the server but those on it could carry on. It seemed that everything was waiting on syslog-ng. I ran truss on it and nothing appeared in the output for ages, then all of a sudden it started working again. Nothing was logged for these 7 minutes
This is probably fixed by this patch (I forgot to set the streams file descriptor to nonblocking mode), and syslog-ng blocked while there was no log messages.
diff --git a/src/afstreams.c b/src/afstreams.c index d0a76f3..fdf18c4 100644 --- a/src/afstreams.c +++ b/src/afstreams.c @@ -134,6 +134,7 @@ afstreams_sd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist) close(fd); return FALSE; } + g_fd_set_nonblock(fd, TRUE); self->reader = log_reader_new(streams_read_new(fd), LR_LOCAL | LR_PKTTERM, s, &self->reader_options); log_pipe_append(self->reader, s);
I'll try this patch next week - I'm away until at least Wednesday.
Truss output:
-bash-3.00$ sudo truss -failed -p 1418 Base time stamp: 1203503996.5724 [ Wed Feb 20 10:39:56 GMT 2008 ] 1418/1: psargs: /opt/syslog-ng/sbin/syslog-ng 1418/1: 98890.2339 getmsg(3, 0xFFFFFFFF7FFFF630, 0xFFFFFFFF7FFFF620, 0xFFFFFFFF7FFFF5FC) = 0
I could successfully reproduce the message loss on my solaris 9 system, but I need some more time to investigate the issue.
Good news - you can reproduce the problem. I'll let you investigate. One other thing worth mentioning are these 2 warnings during compilation which may be of relevance: afstreams.c: In function `afstreams_sd_init': afstreams.c:160: warning: passing arg 1 of `door_create' from incompatible pointer type Regards Andy