[syslog-ng]syslog-ng hanging bringing machine in trouble
Roberto Nibali
ratz@drugphish.ch
Tue, 11 Feb 2003 23:10:02 +0100
Hello,
> I have here a big problem on one of my machine and it looks like that
> it is caused by syslog-ng.
>
> It has similar problems like already written here:
> http://lists.balabit.hu/pipermail/syslog-ng/2003-January/004432.html
I'm not completely sure what problem Greg Hartung was describing there,
however from just reading that message and from checking the syslog-ng
compilation one could assume that 64bit fs access functions are not
linked into the binary (I've neither checked the code nor verified this,
so take it with a grain of salt).
> Red Hat Linux 7.3 with all updates
> Running kernel: currently 2.4.18-18.7.x extended with Openwall patch
Is it reproducable without OWL? Only test it if you can easily do it, if
it's a productive machine, I suspect the downtime is too big to do
heuristic tests.
> Feb 11 19:10:44 gromit syslog-ng[17700]: STATS: dropped 0
> Feb 11 19:20:44 gromit syslog-ng[17700]: STATS: dropped 0
> Feb 11 19:30:45 gromit syslog-ng[17700]: STATS: dropped 0 <--
> Feb 11 20:00:47 gromit syslog-ng[6771]: syslog-ng version 1.5.26
> starting
> Feb 11 20:00:48 gromit syslog-ng: syslog-ng startup succeeded
> Feb 11 20:00:48 gromit syslog-ng: klogd startup succeeded
You do not need to run klogd if you've configured syslog-ng accordingly
unless you need address decoding.
> Feb 11 20:00:53 gromit ldap: slapd startup succeeded
>
>
> I've detected this about 20 min later with following reproducable:
>
> System load increases over 1 (normally, machine has no load)
> "ps -ax" hangs after displaying some processes, "top" will sometimes
> start, sometimes hang
Could you provide a snapshot of 'vmstat 1' output accompagning the 'hangs'?
> Last times I saw also some CROND entries by "ps -ax", one with stat
> "D".
crond can be in D state sometimes, nothing to worry about ;).
[snip configuration part]
> Last week I've disabled postfix's LDAP usage completly to check
> whether it's a LDAP problem here. In former cases (postfix with LDAP
> lookups) postfix will hang completly, a TCP connects, but no HELO
> string was displayed.
I don't yet see the connection between postfix + LDAP and syslog-ng.
> So the big question:
>
> 1) is this a syslog-ng related problem?
Maybe ;)
> 2) is this a LDAP problem? I've already increased threads.
We need more information from the machine's health during the hang
occurance. netstat, sockstat, /proc/net/* information, ...
> I hope someone could point me to some solutions or proper debugging
> methods. Machine is semiproductive since end of September (with
> syslog-ng), but since the beginning such troubles occur.
Please provide us with a strace -f -v -i -t -p $PID_OF_HANGING_SYSLOG-NG
when it happens the next time. Maybe we can see where it hangs exactly.
It's always better than shooting holes into the dark by making
questionable assumptions.
> BTW: is this ok, that if syslog-ng restarts, crond don't log anymore
> until restarted?
I would say no but I'm not sure here, I would also suspect it depends on
the version of cron deployed on your machine.
Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc