Folks, I'd like to report a couple of bugs with syslog-ng whereby there is a window (albeit small) during which messages will just be lost. The problem occurs when a HUP signal is sent to the daemon in order to perform a logrotate. During this time, the daemon re-checks its config file. Before doing that, it closes the old incoming connections. After completing the re-read, it will reconnect up the inputs (typically to the same endpoints, since they hardly ever change, if at all). For the FIFO /dev/log, it not only closes it but it apparently unlinks it as well. Theoretically, life is fine because the clients detect the closed socket and the clients themselves issue retries. All to the good. However, that's not all. On linux (where we're seeing the problem, I don't know if it happens elsewhere), if there is no-one reading from the FIFO then the client gets the error ECONNREFUSED and the libc syslog implementation happily realises what's going on and just retries. But when /dev/log is unlinked and the clients try and connect, they get a different error back - EPROTOCOL. This causes the clients to assume that a FIFO can't be used and fallback to 'lesser' protocols like udp, etc, never to go back up the fallback list to a better protocol. That's an efficiency concern, but at least messages don't get lost. More importantly, messages can get lost between the last read from the FIFO, and between the close() of the FIFO. This may sound unlikely, but we are definitely seeing lost messages at this time (we're logging a very high volume of messages). I can see two possible solutions, maybe there are more: 1. provide an additional signal for logrotate use which only closes and re-opens the logfiles, but doesn't do anything with the configfile. 2. make the re-read of the configfile more clever and only close inputs if really neccessary. Nick.
Does anyone have any ideas about these problems, or indications that they might be fixed at some point? Did anyone see this email ;-) Nick. Williams, Nick (IT) wrote:
Folks,
I'd like to report a couple of bugs with syslog-ng whereby there is a window (albeit small) during which messages will just be lost. The problem occurs when a HUP signal is sent to the daemon in order to perform a logrotate. During this time, the daemon re-checks its config file. Before doing that, it closes the old incoming connections. After completing the re-read, it will reconnect up the inputs (typically to the same endpoints, since they hardly ever change, if at all).
For the FIFO /dev/log, it not only closes it but it apparently unlinks it as well. Theoretically, life is fine because the clients detect the closed socket and the clients themselves issue retries. All to the good. However, that's not all. On linux (where we're seeing the problem, I don't know if it happens elsewhere), if there is no-one reading from the FIFO then the client gets the error ECONNREFUSED and the libc syslog implementation happily realises what's going on and just retries. But when /dev/log is unlinked and the clients try and connect, they get a different error back - EPROTOCOL. This causes the clients to assume that a FIFO can't be used and fallback to 'lesser' protocols like udp, etc, never to go back up the fallback list to a better protocol.
That's an efficiency concern, but at least messages don't get lost. More importantly, messages can get lost between the last read from the FIFO, and between the close() of the FIFO.
This may sound unlikely, but we are definitely seeing lost messages at this time (we're logging a very high volume of messages).
I can see two possible solutions, maybe there are more:
1. provide an additional signal for logrotate use which only closes and re-opens the logfiles, but doesn't do anything with the configfile. 2. make the re-read of the configfile more clever and only close inputs if really neccessary.
Nick.
participants (1)
-
Nick Williams