[syslog-ng]syslog-ng hanging bringing machine in trouble

Roberto Nibali ratz@drugphish.ch
Tue, 11 Feb 2003 23:10:02 +0100


Hello,

> I have here a big problem on one of my machine and it looks like that
> it is caused by syslog-ng.
> 
> It has similar problems like already written here:
> http://lists.balabit.hu/pipermail/syslog-ng/2003-January/004432.html

I'm not completely sure what problem Greg Hartung was describing there, 
however from just reading that message and from checking the syslog-ng 
compilation one could assume that 64bit fs access functions are not 
linked into the binary (I've neither checked the code nor verified this, 
so take it with a grain of salt).

> Red Hat Linux 7.3 with all updates
> Running kernel: currently 2.4.18-18.7.x extended with Openwall patch

Is it reproducable without OWL? Only test it if you can easily do it, if 
it's a productive machine, I suspect the downtime is too big to do 
heuristic tests.

> Feb 11 19:10:44 gromit syslog-ng[17700]: STATS: dropped 0
> Feb 11 19:20:44 gromit syslog-ng[17700]: STATS: dropped 0
> Feb 11 19:30:45 gromit syslog-ng[17700]: STATS: dropped 0 <--
> Feb 11 20:00:47 gromit syslog-ng[6771]: syslog-ng version 1.5.26
> starting
> Feb 11 20:00:48 gromit syslog-ng: syslog-ng startup succeeded
> Feb 11 20:00:48 gromit syslog-ng: klogd startup succeeded

You do not need to run klogd if you've configured syslog-ng accordingly 
unless you need address decoding.

> Feb 11 20:00:53 gromit ldap: slapd startup succeeded
> 
> 
> I've detected this about 20 min later with following reproducable:
> 
> System load increases over 1 (normally, machine has no load)
> "ps -ax" hangs after displaying some processes, "top" will sometimes
> start, sometimes hang

Could you provide a snapshot of 'vmstat 1' output accompagning the 'hangs'?

> Last times I saw also some CROND entries by "ps -ax", one with stat
> "D".

crond can be in D state sometimes, nothing to worry about ;).

[snip configuration part]

> Last week I've disabled postfix's LDAP usage completly to check
> whether it's a LDAP problem here. In former cases (postfix with LDAP
> lookups) postfix will hang completly, a TCP connects, but no HELO
> string was displayed.

I don't yet see the connection between postfix + LDAP and syslog-ng.

> So the big question:
> 
> 1) is this a syslog-ng related problem?

Maybe ;)

> 2) is this a LDAP problem? I've already increased threads.

We need more information from the machine's health during the hang 
occurance. netstat, sockstat, /proc/net/* information, ...

> I hope someone could point me to some solutions or proper debugging
> methods. Machine is semiproductive since end of September (with
> syslog-ng), but since the beginning such troubles occur.

Please provide us with a strace -f -v -i -t -p $PID_OF_HANGING_SYSLOG-NG 
when it happens the next time. Maybe we can see where it hangs exactly. 
It's always better than shooting holes into the dark by making 
questionable assumptions.

> BTW: is this ok, that if syslog-ng restarts, crond don't log anymore
> until restarted?

I would say no but I'm not sure here, I would also suspect it depends on 
the version of cron deployed on your machine.

Best regards,
Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc