Does anyone have any ideas about these problems, or indications that
they might be fixed at some point? Did anyone see this email ;-)
Nick.
Williams, Nick (IT) wrote:
Folks,
I'd like to report a couple of bugs with syslog-ng whereby there is a
window (albeit small) during which messages will just be lost. The
problem occurs when a HUP signal is sent to the daemon in order to
perform a logrotate. During this time, the daemon re-checks its config
file. Before doing that, it closes the old incoming connections. After
completing the re-read, it will reconnect up the inputs (typically to
the same endpoints, since they hardly ever change, if at all).
For the FIFO /dev/log, it not only closes it but it apparently unlinks
it as well. Theoretically, life is fine because the clients detect the
closed socket and the clients themselves issue retries. All to the
good. However, that's not all. On linux (where we're seeing the
problem, I don't know if it happens elsewhere), if there is no-one
reading from the FIFO then the client gets the error ECONNREFUSED and
the libc syslog implementation happily realises what's going on and
just retries. But when /dev/log is unlinked and the clients try and
connect, they get a different error back - EPROTOCOL. This causes the
clients to assume that a FIFO can't be used and fallback to 'lesser'
protocols like udp, etc, never to go back up the fallback list to a
better protocol.
That's an efficiency concern, but at least messages don't get lost.
More importantly, messages can get lost between the last read from the
FIFO, and between the close() of the FIFO.
This may sound unlikely, but we are definitely seeing lost messages at
this time (we're logging a very high volume of messages).
I can see two possible solutions, maybe there are more:
- provide an additional signal for logrotate use which only
closes
and re-opens the logfiles, but doesn't do anything with the configfile.
- make the re-read of the configfile more clever and only close
inputs if really neccessary.
Nick.