replying to myself! a bit more data: I had forgotten that this is a dual processor box and so in fact syslog-ng was hogging one whole processor: 11:10:01 all 45.94 0.00 2.77 4.16 47.12 11:20:01 all 44.65 0.00 2.77 5.89 46.69 11:30:01 all 44.10 0.00 2.80 7.15 45.95 11:40:01 all 44.55 0.00 2.93 6.14 46.38 11:50:01 all 45.05 0.00 2.80 5.30 46.85 12:00:02 all 45.16 0.00 2.63 5.25 46.96 12:10:01 all 45.81 0.00 2.96 4.48 46.75 12:20:02 all 46.41 0.00 2.59 2.46 48.54 12:30:01 all 44.54 0.00 3.09 6.82 45.55 12:40:01 all 45.05 0.00 2.85 5.68 46.42 Average: all 19.97 0.00 1.85 7.92 70.25 13:55:06 LINUX RESTART 14:00:01 CPU %user %nice %system %iowait %idle 14:10:01 all 1.81 0.00 0.99 11.53 85.67 14:20:01 all 1.53 0.00 1.05 1.73 95.69 14:30:01 all 1.30 0.00 0.96 1.34 96.41 14:40:01 all 1.35 0.00 0.91 0.81 96.93 14:50:01 all 1.31 0.00 0.86 0.66 97.16 15:00:01 all 1.44 0.00 0.90 0.53 97.13 15:10:01 all 3.28 0.00 0.91 0.62 95.19 15:20:01 all 1.34 0.00 0.99 0.61 97.06 15:30:01 all 1.24 0.00 0.90 0.62 97.25 15:40:01 all 1.32 0.00 0.82 0.60 97.26 15:50:01 all 6.24 0.00 1.35 2.93 89.47 On this occasion contrary to what I originally stated, syslog-ng this time had stopped processing records from anywhere. i.e. it seems to have got itself into a processor loop. Russell Fulton wrote:
Hi Folks
We operate a central syslog server for several thousand system on a linux box running syslog-ng 2.0rc3. For some time we have noticed that the machine hosting the service slowly locks up and we then have to reset the power to restart it. Recently we significantly increased the load on the box and now this is happening about once a week.
Here are the symptoms we observer:
1. service to systems sending logs to the host are not affected. Logs continue to get written to disk
Incorrect!
2. Cron jobs hang 3. login attempts hang 4. as do sudo attempts. 5. ssh sessions that were established before things turned to custard are not affected 6. top, sar etc. don't show anything unusual 7. ps shows hung cron jobs but nothing else unusual
I thought the fairly high cpu load was just a result of increased load whereas in fact one CPU was flat out. Comparing the state now that it has been running for a few hours to what it was like before the restart we see that memory usage went up dramatically once the cpu starts spinning and the machine quickly started swapping. Russell