syslog-ng hanging bringing machine in trouble
Hi, I have here a big problem on one of my machine and it looks like that it is caused by syslog-ng. It has similar problems like already written here: http://lists.balabit.hu/pipermail/syslog-ng/2003-January/004432.html Machine: Red Hat Linux 7.3 with all updates Running kernel: currently 2.4.18-18.7.x extended with Openwall patch 128 MByte memory syslog-ng: 1.5.26 (but also happen with latest 1.4.x) pam is using a local running LDAP server.
From time to time (let say distance 2-3 weeks or even 8 hours) syslog suddenly stops logging like this evening:
Feb 11 19:10:44 gromit syslog-ng[17700]: STATS: dropped 0 Feb 11 19:20:44 gromit syslog-ng[17700]: STATS: dropped 0 Feb 11 19:30:45 gromit syslog-ng[17700]: STATS: dropped 0 <-- Feb 11 20:00:47 gromit syslog-ng[6771]: syslog-ng version 1.5.26 starting Feb 11 20:00:48 gromit syslog-ng: syslog-ng startup succeeded Feb 11 20:00:48 gromit syslog-ng: klogd startup succeeded Feb 11 20:00:53 gromit ldap: slapd startup succeeded I've detected this about 20 min later with following reproducable: System load increases over 1 (normally, machine has no load) "ps -ax" hangs after displaying some processes, "top" will sometimes start, sometimes hang Last times I saw also some CROND entries by "ps -ax", one with stat "D". syslog-ng configuration is nothing special, destinations are files (divided by yyyymm), sources are source s_local { internal(); unix-stream("/dev/log" keep-alive(yes) max-connections(100)); unix-stream("/var/spool/postfix/dev/log" keep-alive(yes) max-connections(100)); file("/proc/kmsg"); }; options { use_dns(no); use_fqdn(no); use_time_recvd(no); chain_hostnames(no); mark(0); sync(0); }; Last week I've disabled postfix's LDAP usage completly to check whether it's a LDAP problem here. In former cases (postfix with LDAP lookups) postfix will hang completly, a TCP connects, but no HELO string was displayed. So the big question: 1) is this a syslog-ng related problem? 2) is this a LDAP problem? I've already increased threads. I don't believe it's a DNS problem because on machine itself a caching DNS server is running. One note: machine is IPv6 enabled. I hope someone could point me to some solutions or proper debugging methods. Machine is semiproductive since end of September (with syslog-ng), but since the beginning such troubles occur. BTW: is this ok, that if syslog-ng restarts, crond don't log anymore until restarted? Thank you very very much, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/
Hello,
I have here a big problem on one of my machine and it looks like that it is caused by syslog-ng.
It has similar problems like already written here: http://lists.balabit.hu/pipermail/syslog-ng/2003-January/004432.html
I'm not completely sure what problem Greg Hartung was describing there, however from just reading that message and from checking the syslog-ng compilation one could assume that 64bit fs access functions are not linked into the binary (I've neither checked the code nor verified this, so take it with a grain of salt).
Red Hat Linux 7.3 with all updates Running kernel: currently 2.4.18-18.7.x extended with Openwall patch
Is it reproducable without OWL? Only test it if you can easily do it, if it's a productive machine, I suspect the downtime is too big to do heuristic tests.
Feb 11 19:10:44 gromit syslog-ng[17700]: STATS: dropped 0 Feb 11 19:20:44 gromit syslog-ng[17700]: STATS: dropped 0 Feb 11 19:30:45 gromit syslog-ng[17700]: STATS: dropped 0 <-- Feb 11 20:00:47 gromit syslog-ng[6771]: syslog-ng version 1.5.26 starting Feb 11 20:00:48 gromit syslog-ng: syslog-ng startup succeeded Feb 11 20:00:48 gromit syslog-ng: klogd startup succeeded
You do not need to run klogd if you've configured syslog-ng accordingly unless you need address decoding.
Feb 11 20:00:53 gromit ldap: slapd startup succeeded
I've detected this about 20 min later with following reproducable:
System load increases over 1 (normally, machine has no load) "ps -ax" hangs after displaying some processes, "top" will sometimes start, sometimes hang
Could you provide a snapshot of 'vmstat 1' output accompagning the 'hangs'?
Last times I saw also some CROND entries by "ps -ax", one with stat "D".
crond can be in D state sometimes, nothing to worry about ;). [snip configuration part]
Last week I've disabled postfix's LDAP usage completly to check whether it's a LDAP problem here. In former cases (postfix with LDAP lookups) postfix will hang completly, a TCP connects, but no HELO string was displayed.
I don't yet see the connection between postfix + LDAP and syslog-ng.
So the big question:
1) is this a syslog-ng related problem?
Maybe ;)
2) is this a LDAP problem? I've already increased threads.
We need more information from the machine's health during the hang occurance. netstat, sockstat, /proc/net/* information, ...
I hope someone could point me to some solutions or proper debugging methods. Machine is semiproductive since end of September (with syslog-ng), but since the beginning such troubles occur.
Please provide us with a strace -f -v -i -t -p $PID_OF_HANGING_SYSLOG-NG when it happens the next time. Maybe we can see where it hangs exactly. It's always better than shooting holes into the dark by making questionable assumptions.
BTW: is this ok, that if syslog-ng restarts, crond don't log anymore until restarted?
I would say no but I'm not sure here, I would also suspect it depends on the version of cron deployed on your machine. Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc
Hi, thanks for fast answer and some information-retrieving-hints-on-next-hang. --On Tuesday, February 11, 2003 11:10:02 PM +0100 Roberto Nibali <ratz@drugphish.ch> wrote:
Red Hat Linux 7.3 with all updates Running kernel: currently 2.4.18-18.7.x extended with Openwall patch
Is it reproducable without OWL? Only test it if you can easily do it, if it's a productive machine, I suspect the downtime is too big to do heuristic tests.
Sorry, can't do that.
Feb 11 19:10:44 gromit syslog-ng[17700]: STATS: dropped 0 Feb 11 19:20:44 gromit syslog-ng[17700]: STATS: dropped 0 Feb 11 19:30:45 gromit syslog-ng[17700]: STATS: dropped 0 <-- Feb 11 20:00:47 gromit syslog-ng[6771]: syslog-ng version 1.5.26 starting Feb 11 20:00:48 gromit syslog-ng: syslog-ng startup succeeded Feb 11 20:00:48 gromit syslog-ng: klogd startup succeeded
You do not need to run klogd if you've configured syslog-ng accordingly unless you need address decoding.
Hmm, I'm using a given initscript to start syslog-ng: start() { echo -n $"Starting system logger: " daemon syslog-ng $SYSLOGD_OPTIONS -f /etc/syslog-ng.conf RETVAL=$? echo echo -n $"Starting kernel logger: " daemon klogd $KLOGD_OPTIONS echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/syslog-ng return $RETVAL } Does this mean that starting klogd isn't required?
Could you provide a snapshot of 'vmstat 1' output accompagning the 'hangs'?
Added to "emergency-case-todo-list" ;-)
Last times I saw also some CROND entries by "ps -ax", one with stat "D".
crond can be in D state sometimes, nothing to worry about ;).
But they gone after stopping ldap. And everytime seeing in hanging case. Also the number of such cron processes increases, looks like they cannot do something in case of hanging.
2) is this a LDAP problem? I've already increased threads.
We need more information from the machine's health during the hang occurance. netstat, sockstat, /proc/net/* information, ...
Also added to the list.
Please provide us with a strace -f -v -i -t -p $PID_OF_HANGING_SYSLOG-NG when it happens the next time. Maybe we can see where it hangs exactly. It's always better than shooting holes into the dark by making questionable assumptions.
Sure, next add.
BTW: is this ok, that if syslog-ng restarts, crond don't log anymore until restarted?
I would say no but I'm not sure here, I would also suspect it depends on the version of cron deployed on your machine.
vixie-cron-3.0.1-64 Does a lsof | grep crond help? I see only some libs, pipes and sockets. BTW: after the first hangs, I also enable UDP logging to a remote host, but logging will stop, here, too. Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/
Hello Peter,
thanks for fast answer and some information-retrieving-hints-on-next-hang.
You're welcome, let's hope we get something useful out of it.
Is it reproducable without OWL? Only test it if you can easily do it, if it's a productive machine, I suspect the downtime is too big to do heuristic tests.
Sorry, can't do that.
Thought so, well, proc-fs output, netstat, lsof and strace will reveil the problem if it is connected with OWL. BTW could you list (if it's not too big) all the OWL features you've enabled in your running kernel? Not that I suspect it to really have an influence on syslog-ng but safe is safe ;).
You do not need to run klogd if you've configured syslog-ng accordingly unless you need address decoding.
Hmm, I'm using a given initscript to start syslog-ng:
start() { echo -n $"Starting system logger: " daemon syslog-ng $SYSLOGD_OPTIONS -f /etc/syslog-ng.conf RETVAL=$? echo echo -n $"Starting kernel logger: " daemon klogd $KLOGD_OPTIONS echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/syslog-ng return $RETVAL }
Fine, just uncomment the three lines concerning klogd, you should still get kernel messages. <OT> Another thing: Whoever wrote that script part for start() should seriously reconsider reading a good shell book or the advance bash programming guide. </OT>
Does this mean that starting klogd isn't required?
Not really. In the config snipped you posted before you had a file("/proc/kmsg") defined as a source in s_local. I just hope you've got a d_local where you write those messages into.
Last times I saw also some CROND entries by "ps -ax", one with stat "D".
crond can be in D state sometimes, nothing to worry about ;).
But they gone after stopping ldap. And everytime seeing in hanging case. Also the number of such cron processes increases, looks like they cannot do something in case of hanging.
Ok, then I wait with further assumptions until I see some debug output.
I would say no but I'm not sure here, I would also suspect it depends on the version of cron deployed on your machine.
vixie-cron-3.0.1-64
The I suppose it should stop logging. How about if you send a SIGHUP to the cron? pkill -HUP cron. I would also be interested in the process status (signals, state, ...) of syslog-ng, cron and whatever seems to be hanging: cat /proc/${PID_OF_PROCESS}/status
Does a lsof | grep crond help? I see only some libs, pipes and sockets.
Yes, maybe you should also send along the output of: lsof -c cron -c syslog-ng
BTW: after the first hangs, I also enable UDP logging to a remote host, but logging will stop, here, too.
lsof -n -i udp:${PORT_WHERE_YOU_SEND_THE_LOGS_TO} Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc
Hi, --On Wednesday, February 12, 2003 12:03:17 AM +0100 Roberto Nibali <ratz@drugphish.ch> wrote:
Is it reproducable without OWL? Only test it if you can easily do it, if it's a productive machine, I suspect the downtime is too big to do heuristic tests.
Sorry, can't do that.
Thought so, well, proc-fs output, netstat, lsof and strace will reveil the problem if it is connected with OWL. BTW could you list (if it's not too big) all the OWL features you've enabled in your running kernel? Not that I suspect it to really have an influence on syslog-ng but safe is safe ;).
CONFIG_HARDEN_STACK=y CONFIG_HARDEN_STACK_SMART=y CONFIG_HARDEN_LINK=y CONFIG_HARDEN_FIFO=y CONFIG_HARDEN_PROC=y CONFIG_HARDEN_FD_0_1_2=y CONFIG_HARDEN_RLIMIT_NPROC=y CONFIG_HARDEN_SHM=y
start() { echo -n $"Starting system logger: " daemon syslog-ng $SYSLOGD_OPTIONS -f /etc/syslog-ng.conf RETVAL=$? echo echo -n $"Starting kernel logger: " daemon klogd $KLOGD_OPTIONS echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/syslog-ng return $RETVAL }
Fine, just uncomment the three lines concerning klogd, you should still get kernel messages.
Done, indeed.
<OT> Another thing: Whoever wrote that script part for start() should seriously reconsider reading a good shell book or the advance bash programming guide. </OT>
Sure? Didn't look very strange to me. Other initscripts look very similar.
Does this mean that starting klogd isn't required?
Not really. In the config snipped you posted before you had a file("/proc/kmsg") defined as a source in s_local. I just hope you've got a d_local where you write those messages into.
Had defined destination d_kern { file("/var/log/kernel-$YEAR$MONTH"); }; which catches still kernel messages - ok.
I would say no but I'm not sure here, I would also suspect it depends on the version of cron deployed on your machine.
vixie-cron-3.0.1-64
The I suppose it should stop logging. How about if you send a SIGHUP to the cron? pkill -HUP cron.
Won't help. Is this a bug in syslog-ng or crond? Not nice that on every syslog-ng restart crond has to be restarted, too (in case of "not knowing about this issue").
Does a lsof | grep crond help? I see only some libs, pipes and sockets.
Yes, maybe you should also send along the output of:
lsof -c cron -c syslog-ng
Here the crond-after-syslog-restart-no-longer-logging case: # lsof -c crond -c syslog-ng COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME crond 19875 root cwd DIR 3,3 4096 2 /var/spool crond 19875 root rtd DIR 3,1 4096 2 / crond 19875 root txt REG 3,1 23048 82976 /usr/sbin/crond crond 19875 root mem REG 3,1 89547 64281 /lib/ld-2.2.5.so crond 19875 root mem REG 3,1 25572 65003 /lib/libsafe.so.2.0.16 crond 19875 root mem REG 3,1 12102 65975 /lib/libdl-2.2.5.so crond 19875 root mem REG 3,1 45415 64493 /lib/libnss_files-2.2.5.so crond 19875 root mem REG 3,1 1533837 64419 /lib/libnss_ldap-2.2.5.so crond 19875 root mem REG 3,1 68925 64356 /lib/libresolv-2.2.5.so crond 19875 root mem REG 3,1 1402035 64275 /lib/i686/libc-2.2.5.so crond 19875 root 0u CHR 136,0 2 /dev/pts/0 crond 19875 root 1w FIFO 0,4 20072062 pipe crond 19875 root 2w FIFO 0,4 20072063 pipe crond 19875 root 3u REG 3,1 6 177220 /var/run/crond.pid crond 19875 root 4u unix 0xc6341a40 20072069 socket syslog-ng 20308 root cwd DIR 3,1 4096 2 / syslog-ng 20308 root rtd DIR 3,1 4096 2 / syslog-ng 20308 root txt REG 3,1 81576 64714 /sbin/syslog-ng syslog-ng 20308 root mem REG 3,1 89547 64281 /lib/ld-2.2.5.so syslog-ng 20308 root mem REG 3,1 25572 65003 /lib/libsafe.so.2.0.16 syslog-ng 20308 root mem REG 3,1 68925 64356 /lib/libresolv-2.2.5.so syslog-ng 20308 root mem REG 3,1 89424 64328 /lib/libnsl-2.2.5.so syslog-ng 20308 root mem REG 3,1 12102 65975 /lib/libdl-2.2.5.so syslog-ng 20308 root mem REG 3,1 1402035 64275 /lib/i686/libc-2.2.5.so syslog-ng 20308 root 0u CHR 1,3 33972 /dev/null syslog-ng 20308 root 1u CHR 1,3 33972 /dev/null syslog-ng 20308 root 2w FIFO 0,4 20077147 pipe syslog-ng 20308 root 3u unix 0xc072c0a0 20077154 /dev/log syslog-ng 20308 root 5u unix 0xc09daa80 20077156 /var/spool/postfix/dev/log syslog-ng 20308 root 6r REG 0,6 0 4114 /proc/kmsg syslog-ng 20308 root 7u IPv4 20077158 UDP ******:39269->************:syslog syslog-ng 20308 root 8u unix 0xc2372540 20077174 /dev/log syslog-ng 20308 root 9u unix 0xc09da580 20077188 /dev/log syslog-ng 20308 root 10w REG 3,8 255123 42 /var/log/bootlog syslog-ng 20308 root 11w REG 3,8 8134413 72 /var/log/maillog-200302 syslog-ng 20308 root 12u unix 0xc13975a0 20077203 /dev/log syslog-ng 20308 root 13u unix 0xc1988a40 20077223 /dev/log syslog-ng 20308 root 14u unix 0xc1f165c0 20077249 /dev/log syslog-ng 20308 root 15u unix 0xc53c6580 20077269 /dev/log syslog-ng 20308 root 16u unix 0xc525cac0 20077296 /var/spool/postfix/dev/log syslog-ng 20308 root 17u unix 0xc53a1a60 20077301 /dev/log syslog-ng 20308 root 18u unix 0xc525c0c0 20077645 /dev/log Hope this helps! Thank you very much, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/
Hi, now it suddenly occurs again (and is gone now, but leaving strange issues): 1) syslog-ng process is stopped - who stops it? # cat /proc/5317/status Name: syslog-ng State: T (stopped) Tgid: 5317 Pid: 5317 PPid: 1 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 32 Groups: 0 1 2 3 4 6 10 VmSize: 1784 kB VmLck: 0 kB VmRSS: 796 kB VmData: 216 kB VmStk: 12 kB VmExe: 76 kB VmLib: 1432 kB SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 8000000000001000 SigCgt: 0000000000014001 CapInh: 0000000000000000 CapPrm: 00000000fffffeff CapEff: 00000000fffffeff 2) Logtime is going backwards (note: machine has a well-configured NTP) Feb 12 22:23:02 * crond: crond startup succeeded Feb 12 22:32:54 * syslog-ng[5317]: STATS: dropped 0 Feb 12 23:02:26 * syslog-ng[5317]: STATS: dropped 0 Feb 12 22:57:53 * sshd(pam_unix)[6106]: session opened for user root by (uid=0) Feb 12 22:58:32 * login(pam_unix)[10920]: session opened for user root by LOGIN(uid=0) Feb 12 22:58:32 * root[10920]: DIALUP AT ttyS0 BY root Feb 12 22:58:32 * root[10920]: ROOT LOGIN ON ttyS0 Feb 12 23:01:41 * su: pam_ldap: ldap_result Timed out <- !!! Feb 12 23:01:41 * su(pam_unix)[7874]: session opened for user bin by (uid=0) Feb 12 23:02:11 * su(pam_unix)[7874]: session closed for user bin All very strange. More debug data coming in private e-mail. Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/
On Tue, Feb 11, 2003 at 11:39:48PM +0100, Peter Bieringer wrote:
Hmm, I'm using a given initscript to start syslog-ng:
start() { echo -n $"Starting system logger: " daemon syslog-ng $SYSLOGD_OPTIONS -f /etc/syslog-ng.conf RETVAL=$? echo echo -n $"Starting kernel logger: " daemon klogd $KLOGD_OPTIONS echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/syslog-ng return $RETVAL }
Does this mean that starting klogd isn't required?
This is a snippet from the FAQ: * I want to replace syslogd *and* klogd on my Linux box with syslog-ng. Use a source line like this in your conf file to read kernel messages, too. source src { pipe("/proc/kmsg"); unix-stream("/dev/log"); internal(); }; Note: do not run klogd and syslog-ng fetching local kernel messages at the same time. It may cause syslog-ng to block which makes all logging local daemons unusable. * NOTE the note sentence above. The problem was probably caused by klogd & syslog-ng running at the same time. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
NOTE the note sentence above. The problem was probably caused by klogd & syslog-ng running at the same time.
I thought the same that's why I asked him to remove it and to send along an strace output in case it would hit him again. It's also nicely documented on Nate Campi's syslog-ng.conf example. Regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
Hi, --On Wednesday, February 12, 2003 11:14:46 AM +0100 Balazs Scheidler <bazsi@balabit.hu> wrote:
Note: do not run klogd and syslog-ng fetching local kernel messages at the same time. It may cause syslog-ng to block which makes all logging local daemons unusable. *
NOTE the note sentence above. The problem was probably caused by klogd & syslog-ng running at the same time.
Let's see forward. Thank you very much. Perhaps someone should run through all contributed initscripts in the official tarball and disable klogd starting in all the ones where it is still enabled. Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/
Hi again, sorry for next question.
Use a source line like this in your conf file to read kernel messages, too.
source src { pipe("/proc/kmsg"); unix-stream("/dev/log"); internal(); };
Note: do not run klogd and syslog-ng fetching local kernel messages at the same time. It may cause syslog-ng to block which makes all logging local daemons unusable. *
I've done this, but now kernel messages will appear on console now also (thanks to serial console support...) at same time like in defined file. Why? Log message was generated by iptables log with level 6 (=KERN_INFO). Current related config: source s_local { internal(); pipe("/proc/kmsg"); unix-stream("/dev/log" keep-alive(yes) max-connections(100)); unix-stream("/var/spool/postfix/dev/log" keep-alive(yes) max-connections(100)); }; filter f_kern { facility(kern); }; destination d_kern { file("/var/log/kernel-$YEAR$MONTH"); }; log { source(s_local); filter(f_kern); destination(d_kern); }; destination console isn't in use. Do I miss some configuration option? Or is this a feature? Or a bug? Thank you very much for any help, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/
On 2003-02-12 22:37:14 +0100, Peter Bieringer wrote:
I've done this, but now kernel messages will appear on console now also (thanks to serial console support...) at same time like in defined file.
Why?
man klogd: By default the klogd daemon executes a system call to inhibit all kernel messages (except for panics) from being displayed on the console. (actually, I think newer versions of klogd don't do that anymore by default, but the startup scripts probably called it with a sensible value for the -c flag) syslog-ng doesn't do that, so you have to write the right value(s) to /proc/sys/kernel/printk yourself. hp -- _ | Peter J. Holzer | To a database person, |_|_) | Sysadmin WSR | every nail looks like a thumb. | | | hjp@hjp.at | __/ | http://www.hjp.at/ | -- Jamie Zawinski
On Wednesday 12 February 2003 22:45, Peter J. Holzer wrote:
On 2003-02-12 22:37:14 +0100, Peter Bieringer wrote:
I've done this, but now kernel messages will appear on console now also (thanks to serial console support...) at same time like in defined file.
Why?
man klogd:
By default the klogd daemon executes a system call to inhibit all kernel messages (except for panics) from being displayed on the console.
(actually, I think newer versions of klogd don't do that anymore by default, but the startup scripts probably called it with a sensible value for the -c flag)
syslog-ng doesn't do that, so you have to write the right value(s) to /proc/sys/kernel/printk yourself.
Add the line dmesg -n1 to the boot-script before starting syslog-ng Achim
On Tue, Feb 11, 2003 at 11:10:02PM +0100, Roberto Nibali wrote:
BTW: is this ok, that if syslog-ng restarts, crond don't log anymore until restarted?
I would say no but I'm not sure here, I would also suspect it depends on the version of cron deployed on your machine.
it is. the syslog() function in libc clearly sucks. If the log connection is broken (e.g. you restarted syslog-ng) it does not immediately reopen the connection only _after_ the first error occurs. Thus a message is dropped after a log connection breaks. It becomes worse when a parent process opens the log connection, and each child inherits the fd (this is what cron probably does), as the first log message is dropped in every child. This becomes the worst when the child sends only a single log message, when the program becomes absolutely silent. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
I would say no but I'm not sure here, I would also suspect it depends on the version of cron deployed on your machine.
it is. the syslog() function in libc clearly sucks. If the log connection is broken (e.g. you restarted syslog-ng) it does not immediately reopen the connection only _after_ the first error occurs.
I've not observed this behavior with glibc-2.1.3 yet (that's what I'm using for my distribution). Might it be that newer glibc's have this problem? I've made a diff of ../misc/syslog.c between glibc-2.1.3 and glibc-2.2.5 and among beautifying code there was this chunk: @@ -216,17 +239,29 @@ if (!connected || __send(LogFile, buf, bufsize, 0) < 0) { - closelog_internal (); /* attempt re-open next time */ - /* - * Output the message to the console; don't worry about blocking, - * if console blocks everything will. Make sure the error reported - * is the one from the syslogd failure. - */ - if (LogStat & LOG_CONS && - (fd = __open(_PATH_CONSOLE, O_WRONLY|O_NOCTTY, 0)) >= 0) + if (connected) { - dprintf (fd, "%s\r\n", buf + msgoff); - (void)__close(fd); + /* Try to reopen the syslog connection. Maybe it went + down. */ + closelog_internal (); + openlog_internal(LogTag, LogStat | LOG_NDELAY, 0); + } + + if (!connected || __send(LogFile, buf, bufsize, 0) < 0) + { + closelog_internal (); /* attempt re-open next time */ + /* + * Output the message to the console; don't worry + * about blocking, if console blocks everything will. + * Make sure the error reported is the one from the + * syslogd failure. + */ + if (LogStat & LOG_CONS && + (fd = __open(_PATH_CONSOLE, O_WRONLY|O_NOCTTY, 0)) >= 0) + { + dprintf (fd, "%s\r\n", buf + msgoff); + (void)__close(fd); + } } } So one would assume that it used to work for a re-open, wouldn't he?
Thus a message is dropped after a log connection breaks.
Sounds plausible, thanks for the pointer.
It becomes worse when a parent process opens the log connection, and each child inherits the fd (this is what cron probably does), as the first log message is dropped in every child. This becomes the worst when the child sends only a single log message, when the program becomes absolutely silent.
I see. Thanks for the information. Have you heard or seen any different implementation of syslog() which would not have this deficiency? Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
On Wednesday 12 February 2003 13:29, Roberto Nibali wrote:
I've not observed this behavior with glibc-2.1.3 yet (that's what I'm using for my distribution). Might it be that newer glibc's have this problem?
glibc-2.1.3 has this problem but glibc-2.2.5 and glibc-2.3.1 not: glibc-2.1.3/misc/syslog.c in function vsyslog(): if (!connected || __send(LogFile, buf, bufsize, 0) < 0) { closelog_internal (); /* attempt re-open next time */ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ /* * Output the message to the console; don't worry about blocking, * if console blocks everything will. Make sure the error reported * is the one from the syslogd failure. */ if (LogStat & LOG_CONS && (fd = __open(_PATH_CONSOLE, O_WRONLY|O_NOCTTY, 0)) >= 0) { dprintf (fd, "%s\r\n", buf + msgoff); (void)__close(fd); } } If __send() fails and the LOG_CONS flag is set in LogStat, the message will only be written to the system console. Best regards, Achim Gsell
--On Wednesday, February 12, 2003 02:34:28 PM +0100 Achim Gsell <achim@cybercity.ch> wrote:
On Wednesday 12 February 2003 13:29, Roberto Nibali wrote:
I've not observed this behavior with glibc-2.1.3 yet (that's what I'm using for my distribution). Might it be that newer glibc's have this problem?
glibc-2.1.3 has this problem but glibc-2.2.5 and glibc-2.3.1 not: ...
My system is running glibc-2.2.5-42 (RHL 7.3). What now? Are there any other daemons known which do not log further on after syslog-ng restarts? Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/
Hi again, --On Wednesday, February 12, 2003 07:10:42 PM +0100 Peter Bieringer <pb@bieringer.de> wrote:
--On Wednesday, February 12, 2003 02:34:28 PM +0100 Achim Gsell <achim@cybercity.ch> wrote:
On Wednesday 12 February 2003 13:29, Roberto Nibali wrote:
I've not observed this behavior with glibc-2.1.3 yet (that's what I'm using for my distribution). Might it be that newer glibc's have this problem?
glibc-2.1.3 has this problem but glibc-2.2.5 and glibc-2.3.1 not: ...
My system is running glibc-2.2.5-42 (RHL 7.3). What now?
Are there any other daemons known which do not log further on after syslog-ng restarts?
The broken crond logging with syslog-ng is syslog-ng related, because with syslogd, it still works: Feb 12 06:37:27 dhcp-1-66 kernel: Kernel logging (proc) stopped. Feb 12 06:37:27 dhcp-1-66 kernel: Kernel log daemon terminating. Feb 12 06:37:28 dhcp-1-66 syslog: klogd shutdown succeeded Feb 12 06:37:29 dhcp-1-66 exiting on signal 15 Feb 12 06:37:29 dhcp-1-66 syslogd 1.4.1: restart. Feb 12 06:37:29 dhcp-1-66 syslog: syslogd startup succeeded Feb 12 06:37:29 dhcp-1-66 syslog: klogd startup succeeded Feb 12 06:37:29 dhcp-1-66 kernel: klogd 1.4.1, log source = /proc/kmsg started. Feb 12 06:37:29 dhcp-1-66 syslog: syslogd shutdown succeeded Feb 12 06:33:00 dhcp-1-66 crond[852]: (*system*) RELOAD (/etc/crontab) Feb 12 06:33:00 dhcp-1-66 CROND[1192]: (root) CMD (/bin/true) Feb 12 06:33:01 dhcp-1-66 crond[1215]: (CRON) STARTUP (fork ok) Feb 12 06:34:00 dhcp-1-66 CROND[1219]: (root) CMD (/bin/true) Feb 12 06:35:00 dhcp-1-66 CROND[1221]: (root) CMD (/bin/true) Feb 12 06:36:01 dhcp-1-66 CROND[1261]: (root) CMD (/bin/true) Feb 12 06:37:00 dhcp-1-66 CROND[1309]: (root) CMD (/bin/true) Feb 12 06:38:00 dhcp-1-66 CROND[1349]: (root) CMD (/bin/true) vixie-cron-3.01-64 And here some more information. Starting from above I've compiled syslog-ng on this clean system and installed it. Steps 1) sysklogd is running, crond is logging 2) shutdown of sysklogd, start of syslog-ng, crond is logging further on 3) change syslog-ng settings source local { + unix-stream("/dev/log"); + unix-stream("/dev/log" keep-alive(yes) max-connections(512)); - udp(ip(0.0.0.0) port(514)); + #udp(ip(0.0.0.0) port(514)); internal(); }; and restart syslog-ng, crond is logging further on 4) restart crond, crond is logging again 5) restart syslog-ng -> crond is no longer logging Log: Feb 12 06:56:37 dhcp-1-66 kernel: Kernel logging (proc) stopped. Feb 12 06:56:37 dhcp-1-66 kernel: Kernel log daemon terminating. Feb 12 06:56:39 dhcp-1-66 syslog: klogd shutdown succeeded Feb 12 06:56:39 dhcp-1-66 exiting on signal 15 Feb 12 06:58:47 dhcp-1-66 syslog-ng[10279]: syslog-ng version 1.5.26 starting Feb 12 06:58:47 dhcp-1-66 syslog-ng: ...... Feb 12 06:58:47 dhcp-1-66 syslog-ng: syslog-ng startup succeeded Feb 12 07:00:02 dhcp-1-66 syslog-ng[10279]: syslog-ng version 1.5.26 going down Feb 12 07:00:02 dhcp-1-66 syslog-ng[10312]: syslog-ng version 1.5.26 starting Feb 12 07:00:02 dhcp-1-66 syslog-ng: ...... Feb 12 07:00:02 dhcp-1-66 syslog-ng: syslog-ng startup succeeded Feb 12 07:01:39 dhcp-1-66 syslog-ng[10312]: syslog-ng version 1.5.26 going down Feb 12 07:01:40 dhcp-1-66 syslog-ng[10349]: syslog-ng version 1.5.26 starting Feb 12 07:01:40 dhcp-1-66 syslog-ng: ...... Feb 12 07:01:40 dhcp-1-66 syslog-ng: syslog-ng startup succeeded Feb 12 07:02:13 dhcp-1-66 crond: crond shutdown succeeded Feb 12 07:02:14 dhcp-1-66 crond: crond startup succeeded Feb 12 07:03:03 dhcp-1-66 syslog-ng[10349]: syslog-ng version 1.5.26 going down Feb 12 07:03:04 dhcp-1-66 syslog-ng[10407]: syslog-ng version 1.5.26 starting Feb 12 07:03:04 dhcp-1-66 syslog-ng: ...... Feb 12 07:03:04 dhcp-1-66 syslog-ng: syslog-ng startup succeeded # tail -f /var/log/cron Feb 12 06:53:00 dhcp-1-66 CROND[9013]: (root) CMD (/bin/true) Feb 12 06:54:00 dhcp-1-66 CROND[9016]: (root) CMD (/bin/true) Feb 12 06:55:00 dhcp-1-66 CROND[10040]: (root) CMD (/bin/true) Feb 12 06:56:00 dhcp-1-66 CROND[10118]: (root) CMD (/bin/true) Feb 12 06:59:00 dhcp-1-66 CROND[10283]: (root) CMD (/bin/true) Feb 12 07:00:00 dhcp-1-66 CROND[10286]: (root) CMD (/bin/true) Feb 12 07:01:00 dhcp-1-66 CROND[10319]: (root) CMD (/bin/true) Feb 12 07:01:00 dhcp-1-66 CROND[10320]: (root) CMD (run-parts /etc/cron.hourly) Feb 12 07:02:00 dhcp-1-66 CROND[10353]: (root) CMD (/bin/true) Feb 12 07:02:14 dhcp-1-66 crond[10376]: (CRON) STARTUP (fork ok) Feb 12 07:03:00 dhcp-1-66 CROND[10381]: (root) CMD (/bin/true) Strange? Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/
On Thursday 13 February 2003 15:06, Peter Bieringer wrote:
source local { + unix-stream("/dev/log"); ^???
+ unix-stream("/dev/log" keep-alive(yes) max-connections(512)); - udp(ip(0.0.0.0) port(514)); + #udp(ip(0.0.0.0) port(514)); internal(); };
On Thursday 13 February 2003 14:38, Balazs Scheidler wrote:
sysklogd uses a unix-dgram source by default,
Try: unix-dgram("/dev/log") Achim
--On Thursday, February 13, 2003 03:25:44 PM +0100 Achim Gsell <achim@cybercity.ch> wrote:
sysklogd uses a unix-dgram source by default,
Try: unix-dgram("/dev/log")
Done on productive host also, same GOOD result. BTW: can this issue (use of unix-stream instead of unix-dgram) cause such hangups in logging like told in initial postings. Thanks for all the time of reading and answering my e-mails. Peter, very happy for now! -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/
glibc-2.1.3 has this problem but glibc-2.2.5 and glibc-2.3.1 not:
glibc-2.1.3/misc/syslog.c in function vsyslog():
if (!connected || __send(LogFile, buf, bufsize, 0) < 0) { closelog_internal (); /* attempt re-open next time */ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ if (LogStat & LOG_CONS && (fd = __open(_PATH_CONSOLE, O_WRONLY|O_NOCTTY, 0)) >= 0)
D'oh! Next time I should probably read the actual source code rather than a diff ... Thanks and cheers, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc
participants (6)
-
Achim Gsell
-
Balazs Scheidler
-
Peter Bieringer
-
Peter J. Holzer
-
Roberto Nibali
-
Roberto Nibali