Hi, Since at least a couple of month now, we've been experiencing some strange troubles on some of our routers which we now associate to a syslog-ng (or a syslog-ng internal facility) hangup. The version of syslog ng we use is 1.6.8 on different linux kernel 2.4.31 and 2.6.8. I read some posts on this list (https://lists.balabit.hu/pipermail/syslog-ng/2006-May/008784.html) establishing some similar problems. The conditions in which the problem occur are not clear to us. There are obviously different sircumstances that lead syslog-ng to hangup. We first noticed the problem on a server where some users where not able to login, certainly because of the impossibility to write to the /dev/log socket. Reloading syslog-ng when we had an active shell on the server corrected the problem. For this server we "solved" the problem by adding a cron to check the syslog activity and reload syslog if needed. This is not a nice soltution, but it avoids hard reboot of the server. More recently we realized that some routers where not logging some (iptables firewall) events sent to syslog, when in the same time, log from other daemons where treated correctly, again reloading syslog-ng fixes the problem untill the problem randmly accurs again. I am presently studdying the way these log messages are sent to syslog to understand this trouble better. We are tracking the causes of such an annoying behaviour without succes untill now. First of all we would like to understand what is happening in syslog-ng itselfs, at what level is this hangup ? kernel ? syslog ? is it related to /dev/log socket ? Maybe some experts or syslog devloppers can send us some hints ? Is it related to the kernel environement ? /proc ? udev ? Or is it possible that another daemon is responsible for this syslog hangup. Apparently the problem is also present in newer releases in the 1.6.X branch according to the posts on the list, I checked the branch changelogs without seeing anything on that. Has some work been devoted to fix this kind of trouble in more recent branches (1.9 and 2.0) ? We are planning to develop a daemon to monitor syslog-ng and reload the service in case of hangup. If some of you already performed some work in that direction, we would be glad to share the effort or learn the best and more efficient way proceed. Any hints, comments or suggestions are welcome. Thanks in advance. -- Vincent Régnard vregnard@tbs-internet.com TBS-internet.com 027 630 5902
On Fri, 2006-07-28 at 12:38 +0200, Vincent Régnard wrote:
Hi,
Since at least a couple of month now, we've been experiencing some strange troubles on some of our routers which we now associate to a syslog-ng (or a syslog-ng internal facility) hangup. The version of syslog ng we use is 1.6.8 on different linux kernel 2.4.31 and 2.6.8.
I read some posts on this list (https://lists.balabit.hu/pipermail/syslog-ng/2006-May/008784.html) establishing some similar problems.
The conditions in which the problem occur are not clear to us. There are obviously different sircumstances that lead syslog-ng to hangup.
We first noticed the problem on a server where some users where not able to login, certainly because of the impossibility to write to the /dev/log socket. Reloading syslog-ng when we had an active shell on the server corrected the problem. For this server we "solved" the problem by adding a cron to check the syslog activity and reload syslog if needed. This is not a nice soltution, but it avoids hard reboot of the server.
More recently we realized that some routers where not logging some (iptables firewall) events sent to syslog, when in the same time, log from other daemons where treated correctly, again reloading syslog-ng fixes the problem untill the problem randmly accurs again. I am presently studdying the way these log messages are sent to syslog to understand this trouble better.
We are tracking the causes of such an annoying behaviour without succes untill now. First of all we would like to understand what is happening in syslog-ng itselfs, at what level is this hangup ? kernel ? syslog ? is it related to /dev/log socket ? Maybe some experts or syslog devloppers can send us some hints ? Is it related to the kernel environement ? /proc ? udev ? Or is it possible that another daemon is responsible for this syslog hangup.
Apparently the problem is also present in newer releases in the 1.6.X branch according to the posts on the list, I checked the branch changelogs without seeing anything on that. Has some work been devoted to fix this kind of trouble in more recent branches (1.9 and 2.0) ?
We are planning to develop a daemon to monitor syslog-ng and reload the service in case of hangup. If some of you already performed some work in that direction, we would be glad to share the effort or learn the best and more efficient way proceed.
The only hang cause I know about is not really a syslog-ng issue, at least not fixable in syslog-ng alone (although I've already tried to work it around). This problem is related to reading the /proc/kmsg special file, as if multiple processes poll /proc/kmsg, one of them might block as the kernel does not support non-blocking I/O on /proc/kmsg. This is usually caused by: 1) klogd and syslog-ng running on the same host, syslog-ng referencing /proc/kmsg 2) two syslog-ng instances running for some reason (started two times because of lost pidfiles) 3) one syslog-ng having more than a single /proc/kmsg source -- Bazsi
participants (2)
-
Balazs Scheidler
-
Vincent Régnard