[syslog-ng]losing messages during a HUP

Fri, 25 Jun 2004 10:44:13 +0100

This is a multi-part message in MIME format.
--------------040201070206040804070904
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Does anyone have any ideas about these problems, or indications that 
they might be fixed at some point? Did anyone see this email ;-)

Nick.

Williams, Nick (IT) wrote:

> Folks,
>
> I'd like to report a couple of bugs with syslog-ng whereby there is a 
> window (albeit small) during which messages will just be lost. The 
> problem occurs when a HUP signal is sent to the daemon in order to 
> perform a logrotate. During this time, the daemon re-checks its config 
> file. Before doing that, it closes the old incoming connections. After 
> completing the re-read, it will reconnect up the inputs (typically to 
> the same endpoints, since they hardly ever change, if at all).
>
> For the FIFO /dev/log, it not only closes it but it apparently unlinks 
> it as well. Theoretically, life is fine because the clients detect the 
> closed socket and the clients themselves issue retries. All to the 
> good. However, that's not all. On linux (where we're seeing the 
> problem, I don't know if it happens elsewhere), if there is no-one 
> reading from the FIFO then the client gets the error ECONNREFUSED and 
> the libc syslog implementation happily realises what's going on and 
> just retries. But when /dev/log is unlinked and the clients try and 
> connect, they get a different error back - EPROTOCOL. This causes the 
> clients to assume that a FIFO can't be used and fallback to 'lesser' 
> protocols like udp, etc, never to go back up the fallback list to a 
> better protocol.
>
> That's an efficiency concern, but at least messages don't get lost. 
> More importantly, messages can get lost between the last read from the 
> FIFO, and between the close() of the FIFO.
>
> This may sound unlikely, but we are definitely seeing lost messages at 
> this time (we're logging a very high volume of messages).
>
> I can see two possible solutions, maybe there are more:
>
>    1. provide an additional signal for logrotate use which only closes
>       and re-opens the logfiles, but doesn't do anything with the
>       configfile.
>    2. make the re-read of the configfile more clever and only close
>       inputs if really neccessary.
>
>
> Nick.

--------------040201070206040804070904
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
  <title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Does anyone have any ideas about these problems, or indications that
they might be fixed at some point? Did anyone see this email ;-)<br>
<br>
Nick.<br>
<br>
Williams, Nick (IT) wrote:<br>
<blockquote type="cite" cite="mid40C6E3E9.7050001@morganstanley.com">
  <meta content="text/html; " http-equiv="Content-Type">
  <title></title>
Folks,<br>
  <br>
I'd like to report a couple of bugs with syslog-ng whereby there is a
window (albeit small) during which messages will just be lost. The
problem occurs when a HUP signal is sent to the daemon in order to
perform a logrotate. During this time, the daemon re-checks its config
file. Before doing that, it closes the old incoming connections. After
completing the re-read, it will reconnect up the inputs (typically to
the same endpoints, since they hardly ever change, if at all).<br>
  <br>
For the FIFO /dev/log, it not only closes it but it apparently unlinks
it as well. Theoretically, life is fine because the clients detect the
closed socket and the clients themselves issue retries. All to the
good. However, that's not all. On linux (where we're seeing the
problem, I don't know if it happens elsewhere), if there is no-one
reading from the FIFO then the client gets the error ECONNREFUSED and
the libc syslog implementation happily realises what's going on and
just retries. But when /dev/log is unlinked and the clients try and
connect, they get a different error back - EPROTOCOL. This causes the
clients to assume that a FIFO can't be used and fallback to 'lesser'
protocols like udp, etc, never to go back up the fallback list to a
better protocol. <br>
  <br>
That's an efficiency concern, but at least messages don't get lost.
More importantly, messages can get lost between the last read from the
FIFO, and between the close() of the FIFO.<br>
  <br>
This may sound unlikely, but we are definitely seeing lost messages at
this time (we're logging a very high volume of messages).<br>
  <br>
I can see two possible solutions, maybe there are more:<br>
  <ol>
    <li>provide an additional signal for logrotate use which only
closes
and re-opens the logfiles, but doesn't do anything with the configfile.</li>
    <li>make the re-read of the configfile more clever and only close
inputs if really neccessary.</li>
  </ol>
  <br>
Nick.<br>
</blockquote>
<br>
</body>
</html>

--------------040201070206040804070904--