On Wed, 2007-08-01 at 15:08 +1200, Russell Fulton wrote:
Hi Folks
We operate a central syslog server for several thousand system on a linux box running syslog-ng 2.0rc3. For some time we have noticed that the machine hosting the service slowly locks up and we then have to reset the power to restart it. Recently we significantly increased the load on the box and now this is happening about once a week.
Here are the symptoms we observer:
1. service to systems sending logs to the host are not affected. Logs continue to get written to disk 2. Cron jobs hang 3. login attempts hang 4. as do sudo attempts. 5. ssh sessions that were established before things turned to custard are not affected 6. top, sar etc. don't show anything unusual 7. ps shows hung cron jobs but nothing else unusual
We are guessing that syslog-ng is causing any local process that tries to log to hang. In particular login, sudo and cron all cause login activity. If the system is left it eventually runs out of swap and processes.
What we will do is to establish an ssh session to root so next time things come unstuck we can restart syslog-ng to see if that gets things going again.
Any idea what is wrong or what we can do to diagnose the problem.
Do you have program() destinations? Here are two NEWS entries from 2.0.1 that might be related: 2.0.1 Thu, 21 Dec 2006 09:23:44 +0100 Bugfixes: * Fixed a possible syslog-ng hang when a program destination stalled. * Fixed source priorities to avoid starving log listeners. If a continous stream of messages were processed, this could cause new connections not to be accepted, causing a system deadlock. Reading /proc/kmsg from multiple processes could also cause things like this. I would recommend upgrading to 2.0.5 first, and see if you are still affected. -- Bazsi