On Wed, 2005-06-15 at 19:00 -0400, Carson Gaspar wrote:
--On Wednesday, June 15, 2005 5:06 PM +0200 Balazs Scheidler <bazsi@balabit.hu> wrote:
There is rather simple option - keep the last n messages in your FIFO, and resend them if a connection is broken and later re-established.
As for it being a kernel problem, I don't think so. TCP writes aren't synchronous. You'd see the same problems writing to disk - your write() is fine, but the next write(), or close(), may return an error if an I/O error had occurred (which is why you always have to check the return code from close(), boys and girls). A simple example:
- remote process dies with a SIGSEGV (thus not sending FIN) - syslog-ng calls write() - kernel returns OK - kernel transmits TCP packet - kernel gets RST - syslog-ng calls write() - kernel returns EPIPE
There is nothing the kernel can do, unless your IP stack supports synchronous TCP writes (where write doesn't return until the kernel receives an ACK). I'm not sure if there's a standard for this - it's been too long since I was that deep in the API.
The problem that the same thing happens when the remote side does send something that terminates the connection. But the kernel does not react to it, but I agree there might other information pending in the kernel socket buffer. The real solution is to have some kind of application layer ack, or as you say, retransmit the last couple of messages when the connection is broken. (but how much and what happens to duplicate packets?) -- Bazsi