[Bug 113] New: hang on disk access seem to cause network logging to stop
https://bugzilla.balabit.com/show_bug.cgi?id=113 Summary: hang on disk access seem to cause network logging to stop Product: syslog-ng Version: 3.0.x Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: unspecified Component: syslog-ng AssignedTo: bazsi@balabit.hu ReportedBy: arekm@maven.pl Type of the Report: bug Estimated Hours: 0.0 syslog-ng deadlocks in some disk operation (faulty hardware; controler somehow locks up etc) then network logging to another host is also stopping. This is very unfortunate because network logging could give provide me a hints on why disks deadlocks etc. Also if we have few hard disks mounted in /mnt/test1, /mnt/test2, we log things into /mnt/test1/logfile, /mnt/test2/logfile and test test1 disk deadlocks then syslog-ng will also stop logging to test2/logfile. Basically one deadlocked device locks everything else. Major pain here. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=113 --- Comment #1 from Arkadiusz Miśkiewicz <arekm@maven.pl> 2011-02-23 08:26:10 --- Forgot to tell you how to reproduce. mount nfs rw from remote host to /mnt/something tell syslog-ng to log things into /mnt/something/file tell syslog-ng to log into some remote host (optional since even other local disk logging will stop) block nfs on firewall /mnt/something will deadlock try to log 20 messages to syslog-ng, few seems to get logget over network but then syslog-ng stops logging anything, even on local disk. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=113 Balazs Scheidler <bazsi@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution| |INVALID Status|NEW |RESOLVED --- Comment #2 from Balazs Scheidler <bazsi@balabit.hu> 2011-02-23 10:38:17 --- Well, this is more an OS limitation. The POSIX API (and the Linux kernel) won't tell you if: 1) in advacne if a write() would block in this fashion 2) once we call write() it doesn't return The above facts, combined with syslog-ng <= 3.2 being single threaded causes all processing of syslog-ng to stop, if even a single write() blocks. 3.3 will be somewhat better, since it'll be using separate threads to write to output files, even though even in that case, I/O threads may be fully consumed (there's only a limited amount of them) by different file writes. (e.g. if you have 1000 destination files each going into the stalled NFS server, only 1000 I/O threads would be able to cope with this situation). The default number of I/O threads is the number of CPU cores in your system multiplied by two. 3.3 is only available as an alpha release though. One way to solve this problem is to mount your NFS in "soft" mode, quoting the manual page: soft / hard Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application. NB: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option. But it's better to read all NFS mount options and set up values accordingly. In summary, this is less of a syslog-ng related problem, but rather an NFS sideeffect. And even with using "soft" mode, a certain time will be spent in trying to recover, causing syslog-ng to stop in the same way, but not indefinitely. The only way to work around this limitation is to continously monitor the NFS mounts from a script and if it doesn't react in a given timeframe (let's say 1 seconds), tell syslog-ng not to write to those files (by changing the configuration file and SIGHUPing the process). Hope this helps. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=113 --- Comment #3 from Arkadiusz Miśkiewicz <arekm@maven.pl> 2011-02-23 10:46:34 --- NFS case was invented just to make reproduce procedure easier for syslog-ng developers. The real problem here is with hardware when it lockups once per few months and I cannot figure out why because syslog-ng doesn't log anything for me over network when that happens. Threaded 3.3 would likely solve the problem for me since I have only like 20 log files (and I hope it will be possible to specify number of threads or even set "separate-thread-for-each-target" kind of option). -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
participants (1)
-
bugzilla@bugzilla.balabit.com