[syslog-ng] [Bug 113] hang on disk access seem to cause network logging to stop

Wed Feb 23 10:38:17 CET 2011

https://bugzilla.balabit.com/show_bug.cgi?id=113

Balazs Scheidler <bazsi at balabit.hu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|                            |INVALID
             Status|NEW                         |RESOLVED

--- Comment #2 from Balazs Scheidler <bazsi at balabit.hu>  2011-02-23 10:38:17 ---
Well, this is more an OS limitation. The POSIX API (and the Linux kernel) won't tell you if:

  1) in advacne if a write() would block in this fashion
  2) once we call write() it doesn't return

The above facts, combined with syslog-ng <= 3.2 being single threaded causes all processing of syslog-ng to stop, if even a single write() blocks.

3.3 will be somewhat better, since it'll be using separate threads to write to output files, 
even though even in that case, I/O threads may be fully consumed (there's only a limited amount of them) by different
file writes. (e.g. if you have 1000 destination files each going into the stalled NFS server, only 1000 I/O threads would
be able to cope with this situation). The default number of I/O threads is the number of CPU cores in your system multiplied by
two.

3.3 is only available as an alpha release though.

One way to solve this problem is to mount your NFS in "soft" mode, quoting the manual page:

       soft / hard    Determines  the  recovery behavior of the NFS client after an NFS request times out.  If neither option is specified (or if the hard
option is specified), NFS requests are retried indefinitely.  If the soft option is specified, then the NFS client
                      fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.

                      NB: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client
responsiveness is more important than data integrity.  Using NFS over TCP or increasing the value of the retrans option
                      may mitigate some of the risks of using the soft option.

But it's better to read all NFS mount options and set up values accordingly.

In summary, this is less of a syslog-ng related problem, but rather an NFS sideeffect. And even with using "soft" mode, a certain time will
be spent in trying to recover, causing syslog-ng to stop in the same way, but not indefinitely.

The only way to work around this limitation is to continously monitor the NFS mounts from a script and if it doesn't react in a given timeframe (let's say 1
seconds), tell
syslog-ng not to write to those files (by changing the configuration file and SIGHUPing the process).

Hope this helps.

-- 
Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.