[syslog-ng]syslog-ng daemon meltdown

syslog-ng@lists.balabit.hu syslog-ng@lists.balabit.hu
Thu, 29 Jan 2004 00:39:09 -0500


Ran into this meltdown, thought it would be interesting to the list.
Running Syslog-ng 1.6RC3 on Solaris 2.8.  400 mHz dual processor Sun box.

This is a somewhat sanitized version of the actual messages.

>From /var/adm/messages:

Jan 27 15:23:44 nms1 syslog-ng[5651]: STATS: dropped 0
Jan 27 16:06:09 nms1 syslog-ng[5651]: io.c: do_write: write() failed (errno
28), No space left on device
Jan 27 16:06:09 nms1 syslog-ng[5651]: pkt_buffer::do_flush(): Error flushing
 data
Jan 27 16:06:09 nms1 ufs: [ID 845546 kern.notice] NOTICE: alloc: /data: file
 system full
Jan 27 16:06:09 nms1 syslog-ng[5651]: io.c: do_write: write() failed (errno
28), No space left on device
Jan 27 16:06:09 nms1 syslog-ng[5651]: pkt_buffer::do_flush(): Error flushing
 data
...
[These messages repeat for a few minutes, until suddenly massive drops]
[This was strange to me, because the /data filesystem did not appear to
actually fill up.  I never did anything to clear it, and I had a 
few hundred MB free space when I looked at the server after the failure
the next day.]
Jan 27 16:23:44 nms1 syslog-ng[5651]: STATS: dropped 525887
Jan 27 17:23:44 nms1 syslog-ng[5651]: STATS: dropped 1897106
Jan 27 18:23:44 nms1 syslog-ng[5651]: STATS: dropped 1915287
... [about 2M drops every hour until midnight]
Jan 28 00:23:44 nms1 syslog-ng[5651]: STATS: dropped 46122
[syslog-ng continues dropping messages at our "normal" rate.  I.e. the flood
is over at around midnight.]

>From the network device syslogs:
[this is the critter that was flooding us]
Jan 27 14:19:09 content1 US/Eastern:%SYS-2-CE:  SCSI I/O error: POSSIBLE BAD
DISK -- device 0x807, sector 4672
Jan 27 14:19:09 content1 US/Eastern:%SYS-2-CE:  SCSI I/O error: POSSIBLE BAD
DISK -- device 0x808, sector 768
...
[this continues for almost 3 more hours, hundreds of various sector errors per second]

Finally, syslog-ng "freezes up" with these last two/three messages:
(After 17:06:09, syslog-ng dropped all messages.  Syslog-ng
logged nothing more to the files until I stopped and restarted syslog-ng
daemon.)
Jan 27 17:06:09 content1 US/Eastern:%SYS-2-CE:  SCSI I/O error: POSSIBLE BAD
DISK -- device 0x807, sector 537872
Jan 27 17:06:09 conJan 28 10:09:17 content2 US/Eastern:%OVERLOAD-5-CE:  %CE-BYPS-5-130007: OVERLOAD: Preload did not receive overload message. 
Preload may not be running

The last message from content1 device is interrupted in mid-stream.


__________________________________________________________________
New! Unlimited Netscape Internet Service.
Only $9.95 a month -- Sign up today at http://isp.netscape.com/register
Act now to get a personalized email address!

Netscape. Just the Net You Need.