syslog-ng 1.5.26 dies after remote logging host goes down
Hi, looks like a bug, can't be a feature... I have defined a remote destination like: destination d_remotelog { udp( "192.168.1.1" port(514) ) ;}; And use it like log { source(s_local); filter(f_emerg); destination(d_all); destination(d_remotelog); }; Today, I've rebooted "remotelog" host, and detect, that the syslog-ng daemon on the other box dies without any "die now" notice. Last log entries: Feb 28 09:12:38 host syslog-ng[28338]: STATS: dropped 0 Feb 28 09:19:47 host syslog-ng[28338]: Connection broken to AF_INET(192.168.1.1.50:514), reopening in 60 seconds Looks like it dies short after this time, because a running mailman-2.0.x cron job stops logging, too: cron-200302:Feb 28 09:18:01 host CROND[368]: (mailman) CMD (/usr/bin/python -S /var/mailman/cron/qrunner) cron-200302:Feb 28 09:19:00 host CROND[388]: (root) CMD (/usr/local/sbin/AVP_update_bases.pl) cron-200302:Feb 28 09:19:00 host CROND[389]: (mailman) CMD (/usr/bin/python -S /var/mailman/cron/qrunner) cron-200302:Feb 28 09:20:01 host CROND[407]: (root) CMD (/usr/local/sbin/ipacc-get tmplog) cron-200302:Feb 28 09:20:01 host CROND[408]: (root) CMD (/usr/lib/sa/sa1 1 1) cron-200302:Feb 28 09:20:01 host CROND[409]: (mailman) CMD (/usr/bin/python -S /var/mailman/cron/qrunner) cron-200302:Feb 28 09:20:01 host CROND[410]: (mailman) CMD (/usr/bin/python -S /var/mailman/cron/gate_news) cron-200302:Feb 28 09:21:00 host CROND[450]: (mailman) CMD (/usr/bin/python -S /var/mailman/cron/qrunner) [ no further lines :-( ] Nothing else was logged except this 5 cron lines, checked by # grep "Feb 28 09:2" * |more BTW (hopefully off-topic): "host" is IPv6 enabled in general. Runned kernel: RHL 2.4.18-18.7.x with OpenWall patch Hopefully someone can help me. Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/
On Fri, Feb 28, 2003 at 09:48:28AM +0100, Peter Bieringer wrote:
Hi,
looks like a bug, can't be a feature...
I have defined a remote destination like:
destination d_remotelog { udp( "192.168.1.1" port(514) ) ;};
And use it like log { source(s_local); filter(f_emerg); destination(d_all); destination(d_remotelog); };
Today, I've rebooted "remotelog" host, and detect, that the syslog-ng daemon on the other box dies without any "die now" notice. Last log entries:
Feb 28 09:12:38 host syslog-ng[28338]: STATS: dropped 0 Feb 28 09:19:47 host syslog-ng[28338]: Connection broken to AF_INET(192.168.1.1.50:514), reopening in 60 seconds
can you show an strace of this? I've tried to reproduce the problem without success: io_iter(): POLLHUP on inactive fd! Marking fd 4 for closing. Connection broken to AF_INET(192.168.131.2:2001), reopening in 60 seconds Closing fd 4. Read EOF on fd 5. Marking fd 5 for closing. AF_INET client dropped connection from 127.0.0.1, port 56879 Closing fd 5. .... io.c: connecting using fd 4 connecting fd 4 to inetaddr 192.168.131.2, port 2001 io.c: Preparing fd 4 for writing io.c: do_write: write() failed (errno 111), Connection refused pkt_buffer::do_flush(): Error flushing data Marking fd 4 for closing. Connection broken to AF_INET(192.168.131.2:2001), reopening in 60 seconds Closing fd 4. so it definitely reattempts connecting to the remote host. this was my configuration file: source src { tcp(port(2000)); internal(); }; destination dst { udp("192.168.131.2" port(2001)); }; log { source(src); destination(dst); }; -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
--On Friday, February 28, 2003 09:54:23 AM +0100 Balazs Scheidler <bazsi@balabit.hu> wrote:
I have defined a remote destination like:
destination d_remotelog { udp( "192.168.1.1" port(514) ) ;};
And use it like log { source(s_local); filter(f_emerg); destination(d_all); destination(d_remotelog); };
Today, I've rebooted "remotelog" host, and detect, that the syslog-ng daemon on the other box dies without any "die now" notice. Last log entries:
Feb 28 09:12:38 host syslog-ng[28338]: STATS: dropped 0 Feb 28 09:19:47 host syslog-ng[28338]: Connection broken to AF_INET(192.168.1.1.50:514), reopening in 60 seconds
can you show an strace of this?
It happen again, looks like function "abort" was called. I'll restart straced syslogd now with core dump enabled..."hopefully" it crashed again during further playing around: write(11, "<22>Feb 28 11:30:12 loghost postf"..., 176) = 176 write(11, "<22>Feb 28 11:30:12 loghost postf"..., 108) = 108 write(11, "<22>Feb 28 11:30:13 loghost postf"..., 127) = -1 ECONNREFUSED (Connection refused) getpid() = 6360 time(NULL) = 1046428243 open("/var/log/messages-200302", O_WRONLY|O_NONBLOCK|O_APPEND|O_CREAT|O_NOCTTY|O_LARGEFILE, 0 600) = 12 chown32(0x8069ef0, 0, 0) = 0 chmod("/var/log/messages-200302", 0600) = 0 fcntl64(12, F_GETFL) = 0x8c01 (flags O_WRONLY|O_NONBLOCK|O_APPEND|O_LARGEF ILE) fcntl64(12, F_SETFL, O_WRONLY|O_NONBLOCK|O_APPEND|O_LARGEFILE) = 0 fcntl64(12, F_SETFD, FD_CLOEXEC) = 0 time(NULL) = 1046428243 time(NULL) = 1046428243 time(NULL) = 1046428243 getpid() = 6360 time(NULL) = 1046428243 time(NULL) = 1046428243 time(NULL) = 1046428243 close(11) = 0 poll([{fd=12, events=POLLOUT, revents=POLLOUT}, {fd=10, events=0}, {fd=4, events=0}, {fd=7, e vents=0}, {fd=8, events=0}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}, {fd=3, events=POLLI N}], 8, 100) = 1 write(12, "Feb 28 11:30:43 loghost syslog-ng"..., 217) = 217 time(NULL) = 1046428243 poll([{fd=12, events=0}, {fd=10, events=0}, {fd=4, events=0}, {fd=7, events=0}, {fd=8, events =0}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}, {fd=3, events=POLLIN}], 8, 100) = 0 getpid() = 6360 time(NULL) = 1046428243 time(NULL) = 1046428243 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 getpid() = 6360 kill(6360, SIGABRT) = 0 --- SIGABRT (Aborted) --- Peter
On Fri, Feb 28, 2003 at 11:58:44AM +0100, Dr. Peter Bieringer wrote:
--On Friday, February 28, 2003 09:54:23 AM +0100 Balazs Scheidler <bazsi@balabit.hu> wrote:
I have defined a remote destination like:
destination d_remotelog { udp( "192.168.1.1" port(514) ) ;};
And use it like log { source(s_local); filter(f_emerg); destination(d_all); destination(d_remotelog); };
Today, I've rebooted "remotelog" host, and detect, that the syslog-ng daemon on the other box dies without any "die now" notice. Last log entries:
Feb 28 09:12:38 host syslog-ng[28338]: STATS: dropped 0 Feb 28 09:19:47 host syslog-ng[28338]: Connection broken to AF_INET(192.168.1.1.50:514), reopening in 60 seconds
can you show an strace of this?
It happen again, looks like function "abort" was called. I'll restart straced syslogd now with core dump enabled..."hopefully" it crashed again during further playing around:
write(11, "<22>Feb 28 11:30:12 loghost postf"..., 176) = 176 write(11, "<22>Feb 28 11:30:12 loghost postf"..., 108) = 108 write(11, "<22>Feb 28 11:30:13 loghost postf"..., 127) = -1 ECONNREFUSED (Connection refused) getpid() = 6360 time(NULL) = 1046428243 open("/var/log/messages-200302", O_WRONLY|O_NONBLOCK|O_APPEND|O_CREAT|O_NOCTTY|O_LARGEFILE, 0 600) = 12 chown32(0x8069ef0, 0, 0) = 0 chmod("/var/log/messages-200302", 0600) = 0 fcntl64(12, F_GETFL) = 0x8c01 (flags O_WRONLY|O_NONBLOCK|O_APPEND|O_LARGEF ILE) fcntl64(12, F_SETFL, O_WRONLY|O_NONBLOCK|O_APPEND|O_LARGEFILE) = 0 fcntl64(12, F_SETFD, FD_CLOEXEC) = 0 time(NULL) = 1046428243 time(NULL) = 1046428243 time(NULL) = 1046428243 getpid() = 6360 time(NULL) = 1046428243 time(NULL) = 1046428243 time(NULL) = 1046428243 close(11) = 0 poll([{fd=12, events=POLLOUT, revents=POLLOUT}, {fd=10, events=0}, {fd=4, events=0}, {fd=7, e vents=0}, {fd=8, events=0}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}, {fd=3, events=POLLI N}], 8, 100) = 1 write(12, "Feb 28 11:30:43 loghost syslog-ng"..., 217) = 217
what is this message ?
time(NULL) = 1046428243 poll([{fd=12, events=0}, {fd=10, events=0}, {fd=4, events=0}, {fd=7, events=0}, {fd=8, events =0}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}, {fd=3, events=POLLIN}], 8, 100) = 0 getpid() = 6360 time(NULL) = 1046428243 time(NULL) = 1046428243 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 getpid() = 6360 kill(6360, SIGABRT) = 0 --- SIGABRT (Aborted) ---
a backtrace would certainly help a bit more. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
--On Freitag, 28. Februar 2003 14:56 +0100 Balazs Scheidler <bazsi@balabit.hu> wrote:
On Fri, Feb 28, 2003 at 11:58:44AM +0100, Dr. Peter Bieringer wrote:
poll([{fd=12, events=POLLOUT, revents=POLLOUT}, {fd=10, events=0}, {fd=4, events=0}, {fd=7, e vents=0}, {fd=8, events=0}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}, {fd=3, events=POLLI N}], 8, 100) = 1 write(12, "Feb 28 11:30:43 loghost syslog-ng"..., 217) = 217
what is this message ?
Feb 28 11:30:43 gromit syslog-ng[6360]: io.c: do_write: write() failed (errno 111), Connection refused Feb 28 11:30:43 gromit syslog-ng[6360]: Connection broken to AF_INET(1.2.3.4:514), reopening in 60 seconds
kill(6360, SIGABRT) = 0 --- SIGABRT (Aborted) ---
Died again, but unfortunately, starting function "daemon" sets ulimit to 0 again, fixed now.
a backtrace would certainly help a bit more.
I look for providing a core file asap. Peter
participants (3)
-
Balazs Scheidler
-
Dr. Peter Bieringer
-
Peter Bieringer