--On Wednesday, June 15, 2005 5:06 PM +0200 Balazs Scheidler <bazsi@balabit.hu> wrote:
Sorry, but no. It is not the first message that gets lost, its the last on the old connection. The kernel happily acks the sent message to syslog-ng, but in the end it does not actually send it. So there is no way to determine if that message was actually sent or not.
The only solution I see now is what is implemented in the 1.9.x series: in addition to write the TCP socket, also read for possible EOF marks. This way syslog-ng can realize that a connection is closed. There is a small window of opportunity the same happening, but the window is small. (e.g. it is possible that the connection breaks, syslog-ng sends a message and the discover that the connection was broken) but it is still way better than the current solution.
An interesting question whether the same happens on other IP stacks, or it is only Linux which shows this behaviour. (because if this is the case, the Linux kernel could be fixed as well)
There is rather simple option - keep the last n messages in your FIFO, and resend them if a connection is broken and later re-established. As for it being a kernel problem, I don't think so. TCP writes aren't synchronous. You'd see the same problems writing to disk - your write() is fine, but the next write(), or close(), may return an error if an I/O error had occurred (which is why you always have to check the return code from close(), boys and girls). A simple example: - remote process dies with a SIGSEGV (thus not sending FIN) - syslog-ng calls write() - kernel returns OK - kernel transmits TCP packet - kernel gets RST - syslog-ng calls write() - kernel returns EPIPE There is nothing the kernel can do, unless your IP stack supports synchronous TCP writes (where write doesn't return until the kernel receives an ACK). I'm not sure if there's a standard for this - it's been too long since I was that deep in the API. -- Carson