losing messages during a HUP

9 Jun 2004

      Folks,

I'd like to report a couple of bugs with syslog-ng whereby there is a 
window (albeit small) during which messages will just be lost. The 
problem occurs when a HUP signal is sent to the daemon in order to 
perform a logrotate. During this time, the daemon re-checks its config 
file. Before doing that, it closes the old incoming connections. After 
completing the re-read, it will reconnect up the inputs (typically to 
the same endpoints, since they hardly ever change, if at all).

For the FIFO /dev/log, it not only closes it but it apparently unlinks 
it as well. Theoretically, life is fine because the clients detect the 
closed socket and the clients themselves issue retries. All to the good. 
However, that's not all. On linux (where we're seeing the problem, I 
don't know if it happens elsewhere), if there is no-one reading from the 
FIFO then the client gets the error ECONNREFUSED and the libc syslog 
implementation happily realises what's going on and just retries. But 
when /dev/log is unlinked and the clients try and connect, they get a 
different error back - EPROTOCOL. This causes the clients to assume that 
a FIFO can't be used and fallback to 'lesser' protocols like udp, etc, 
never to go back up the fallback list to a better protocol.

That's an efficiency concern, but at least messages don't get lost. More 
importantly, messages can get lost between the last read from the FIFO, 
and between the close() of the FIFO.

This may sound unlikely, but we are definitely seeing lost messages at 
this time (we're logging a very high volume of messages).

I can see two possible solutions, maybe there are more:

   1. provide an additional signal for logrotate use which only closes
      and re-opens the logfiles, but doesn't do anything with the
      configfile.
   2. make the re-read of the configfile more clever and only close
      inputs if really neccessary.

Nick.

Nick Williams

Nick Williams

tags

participants (1)