[syslog-ng]losing messages during a HUP
Nick Williams
syslog-ng@lists.balabit.hu
Wed, 09 Jun 2004 11:18:17 +0100
This is a multi-part message in MIME format.
--------------000900090606050209030101
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Folks,
I'd like to report a couple of bugs with syslog-ng whereby there is a
window (albeit small) during which messages will just be lost. The
problem occurs when a HUP signal is sent to the daemon in order to
perform a logrotate. During this time, the daemon re-checks its config
file. Before doing that, it closes the old incoming connections. After
completing the re-read, it will reconnect up the inputs (typically to
the same endpoints, since they hardly ever change, if at all).
For the FIFO /dev/log, it not only closes it but it apparently unlinks
it as well. Theoretically, life is fine because the clients detect the
closed socket and the clients themselves issue retries. All to the good.
However, that's not all. On linux (where we're seeing the problem, I
don't know if it happens elsewhere), if there is no-one reading from the
FIFO then the client gets the error ECONNREFUSED and the libc syslog
implementation happily realises what's going on and just retries. But
when /dev/log is unlinked and the clients try and connect, they get a
different error back - EPROTOCOL. This causes the clients to assume that
a FIFO can't be used and fallback to 'lesser' protocols like udp, etc,
never to go back up the fallback list to a better protocol.
That's an efficiency concern, but at least messages don't get lost. More
importantly, messages can get lost between the last read from the FIFO,
and between the close() of the FIFO.
This may sound unlikely, but we are definitely seeing lost messages at
this time (we're logging a very high volume of messages).
I can see two possible solutions, maybe there are more:
1. provide an additional signal for logrotate use which only closes
and re-opens the logfiles, but doesn't do anything with the
configfile.
2. make the re-read of the configfile more clever and only close
inputs if really neccessary.
Nick.
--------------000900090606050209030101
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Folks,<br>
<br>
I'd like to report a couple of bugs with syslog-ng whereby there is a
window (albeit small) during which messages will just be lost. The
problem occurs when a HUP signal is sent to the daemon in order to
perform a logrotate. During this time, the daemon re-checks its config
file. Before doing that, it closes the old incoming connections. After
completing the re-read, it will reconnect up the inputs (typically to
the same endpoints, since they hardly ever change, if at all).<br>
<br>
For the FIFO /dev/log, it not only closes it but it apparently unlinks
it as well. Theoretically, life is fine because the clients detect the
closed socket and the clients themselves issue retries. All to the
good. However, that's not all. On linux (where we're seeing the
problem, I don't know if it happens elsewhere), if there is no-one
reading from the FIFO then the client gets the error ECONNREFUSED and
the libc syslog implementation happily realises what's going on and
just retries. But when /dev/log is unlinked and the clients try and
connect, they get a different error back - EPROTOCOL. This causes the
clients to assume that a FIFO can't be used and fallback to 'lesser'
protocols like udp, etc, never to go back up the fallback list to a
better protocol. <br>
<br>
That's an efficiency concern, but at least messages don't get lost.
More importantly, messages can get lost between the last read from the
FIFO, and between the close() of the FIFO.<br>
<br>
This may sound unlikely, but we are definitely seeing lost messages at
this time (we're logging a very high volume of messages).<br>
<br>
I can see two possible solutions, maybe there are more:<br>
<ol>
<li>provide an additional signal for logrotate use which only closes
and re-opens the logfiles, but doesn't do anything with the configfile.</li>
<li>make the re-read of the configfile more clever and only close
inputs if really neccessary.</li>
</ol>
<br>
Nick.<br>
</body>
</html>
--------------000900090606050209030101--