Hello,
I have here a big problem on one of my machine and it looks like that it is caused by syslog-ng.
It has similar problems like already written here: http://lists.balabit.hu/pipermail/syslog-ng/2003-January/004432.html
I'm not completely sure what problem Greg Hartung was describing there, however from just reading that message and from checking the syslog-ng compilation one could assume that 64bit fs access functions are not linked into the binary (I've neither checked the code nor verified this, so take it with a grain of salt).
Red Hat Linux 7.3 with all updates Running kernel: currently 2.4.18-18.7.x extended with Openwall patch
Is it reproducable without OWL? Only test it if you can easily do it, if it's a productive machine, I suspect the downtime is too big to do heuristic tests.
Feb 11 19:10:44 gromit syslog-ng[17700]: STATS: dropped 0 Feb 11 19:20:44 gromit syslog-ng[17700]: STATS: dropped 0 Feb 11 19:30:45 gromit syslog-ng[17700]: STATS: dropped 0 <-- Feb 11 20:00:47 gromit syslog-ng[6771]: syslog-ng version 1.5.26 starting Feb 11 20:00:48 gromit syslog-ng: syslog-ng startup succeeded Feb 11 20:00:48 gromit syslog-ng: klogd startup succeeded
You do not need to run klogd if you've configured syslog-ng accordingly unless you need address decoding.
Feb 11 20:00:53 gromit ldap: slapd startup succeeded
I've detected this about 20 min later with following reproducable:
System load increases over 1 (normally, machine has no load) "ps -ax" hangs after displaying some processes, "top" will sometimes start, sometimes hang
Could you provide a snapshot of 'vmstat 1' output accompagning the 'hangs'?
Last times I saw also some CROND entries by "ps -ax", one with stat "D".
crond can be in D state sometimes, nothing to worry about ;). [snip configuration part]
Last week I've disabled postfix's LDAP usage completly to check whether it's a LDAP problem here. In former cases (postfix with LDAP lookups) postfix will hang completly, a TCP connects, but no HELO string was displayed.
I don't yet see the connection between postfix + LDAP and syslog-ng.
So the big question:
1) is this a syslog-ng related problem?
Maybe ;)
2) is this a LDAP problem? I've already increased threads.
We need more information from the machine's health during the hang occurance. netstat, sockstat, /proc/net/* information, ...
I hope someone could point me to some solutions or proper debugging methods. Machine is semiproductive since end of September (with syslog-ng), but since the beginning such troubles occur.
Please provide us with a strace -f -v -i -t -p $PID_OF_HANGING_SYSLOG-NG when it happens the next time. Maybe we can see where it hangs exactly. It's always better than shooting holes into the dark by making questionable assumptions.
BTW: is this ok, that if syslog-ng restarts, crond don't log anymore until restarted?
I would say no but I'm not sure here, I would also suspect it depends on the version of cron deployed on your machine. Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc