We had this problem at AMD. The problem turned out to be that /dev/console was attached to a device (an iLO in our case) that went offline occasionally and would block on writes. We fixed it by updating our syslog-ng.conf to write to the console using a pipe() directive instead of file(). You may have something similar, especially if there are occasional messages that get routed to /dev/console (or any other pipe/device that may block). Paul Krizak 7171 Southwest Pkwy MS B200.3A MTS Systems Engineer Austin, TX 78735 Advanced Micro Devices Desk: (512) 602-8775 Linux/Unix Systems Engineering Cell: (512) 791-0686 Global IT Infrastructure Fax: (512) 602-0468 On 08/30/2011 01:40 PM, Gergely Nagy wrote:
Hi!
While this mail might sound a bit vague, it will - if nothing else - serve as a reminder for me to investigate the issue furhter.
On one of my servers (PowerPC, running Debian Squeeze), I have a syslog-ng 3.3 running, a reasonably recent (2-3 day old) git snapshot. It works quite well, except that I was able to trace back my server's recent hangs to syslog-ng:
The server had a ~120 day uptime when I upgraded from 3.1 to 3.3, and since that time, it had to be rebooted two times already, just in two weeks time. Last time, I didn't have any open connections to it, so couldn't investigate, but tonight, I had an ssh session open with a screen session inside.
So I tried to look around: first, I wanted to check the logs, but knew I wouldn't find anything, as it stopped sending the logs to my other server about two hours before I noticed the problem. Even worse, when I tried to sudo, that hung, indefinitely. Weird.
There was nothing in dmesg, and nothing interesting in the logs it did send before becoming unresponsive. HTTP still worked too, as did a few other services. I could do nearly anything as a user.
So I tried stracing crontab, and it hung when it tried to send logs to /dev/log. Interesting! I tried logger, same happens.
I suspect that for one reason or the other, /dev/log got overwhelmed, and even worse, syslog-ng ended up trying to log something aswell, which made it hang too. And thus, the queue remained full, and everything that tried to log, got stuck.
HTTP continued to work, since my httpd isn't using syslog for its logs. I could poke around in my shell, since that wasn't logging, either.
This never happened with 3.1, and the only thing I changed in the config is the @version, pretty much. Thus, I suspect, there's some very nasty bug in 3.3beta2 that I haven't found yet.
I'm leaving a root shell open this time, so that I can poke around further next time (along with a syslog-ng compiled with debug symbols).
In the meantime, I thought I'll drop a note, hoping that perhaps Bazsi or someone from the syslog-ng devel team would have an idea where to look, and what to check next time this happens.