[syslog-ng]preparing syslog-ng 1.6.6

Balazs Scheidler syslog-ng@lists.balabit.hu
Thu, 03 Feb 2005 12:02:36 +0100


On Thu, 2005-02-03 at 11:31 +0100, Roberto Nibali wrote:

> > The problem is that syslog-ng polls the destination TCP sockets for
> > writing only,
> 
> That's the issue with poll(). Dumb question: Why not using select()?

the same, in the kernel internally both poll and select use the same
interfaces, e.g. the same rules apply how they will indicate
readability.

> 
> > and whenever the remote endpoint closes the TCP
> > connection, it is not indicated in any way (as closing a socket triggers
> > readability and not writability). Whenever a message is to be written to
> > this socket, the first write() syscall succeeds, and only the next
> > write() will return EPIPE, so syslog-ng is able to detect the broken
> > connection.
> 
> Also a signal SIGPIPE is invoked. Again, stupid question: Couldn't you 
> use SIGPIPE to inject the write request off the remaining buffer? 

I think SIGPIPE is issued at the same time write() returns EPIPE, so I
think it also happens at the second write, and the kernel already
acknowledged the previous message.

I was curious whether this was true, but even when I disabled SIG_IGNing
SIGPIPE, it did not occur for some reason. tcp(7) states that SIGPIPEs
are only triggered for SO_KEEPALIVE-d sockets, I enabled SO_KEEPALIVE
still no SIGPIPEs.

Anyway I don't think SIGPIPE would help us here. The real problem is
that the kernel returns success for the write() system call, while the
connection was already broken. I hackish solution would be to buffer the
last line written, and in case of failure push it back to the FIFO
queue. This is ugly but could work.


> The 
> way I see syslog-ng functionally working (extremely simplified, please 
> correct), is:
> 
> o syslog-ng polls on read_fds for incoming syslog messages
> o syslog-ng maintains a queue or linked list of messages where the newly 
> arrived messages get queued up for delivery. This queue is also used in 
> case a destination is down and needs to be reprobed (reopened) for a 
> connection.
> o syslog-ng polls on write_fds for outgoing possibilities and if 
> success, sends out in FIFO the queued messages.

yes, it is more or less correct, though the same loop is used for
read/write polling.

> o TCP close -> eof reaches the socket and gets passed up to syslog-ng 
> which has already sent (write()) one line _but_ not yet lost the buffer 
> it has written.
> o A new write will return EPIPE and a SIGPIPE signal.
> 
> The idea is to either pass the EPIPE back to the caller function sending 
> the syslog message chunk or to invoke a signal handler that signals the 
> caller to resend that message again. Or use select() to poll for 
> readability? Or create a thread within the calling stack (same process 
> with access to the write buffer) of the poll function and have it wait 
> on a condition variable which is set upon EPIPE. The thread waits in 
> pthread_cond_wait() and will write the last successfully written buffer 
> again.

Again the solution I outlined above might be ok, though the fact that
syslog-ng might coalesce outgoing TCP writes it is not very simple.

[snip]
> 
> Could you point me to the code in question in 1.6.x so I could check it 
> out for myself, please?

it is implemented in libol/src/pkt_buffer.c, packet buffers can operate
in two modes packet and stream mode. packet mode makes the output
routines write a single message at a time, stream mode enables write
coalescing. These two modes have two independent flush functions:

static int do_flush_stream(struct abstract_buffer *c, struct abstract_write *w)
static int do_flush_pkt(struct abstract_buffer *c, struct abstract_write *w)

Solving the first would mean to save the last coalesced buffer and push it back 
to the buffer in case of EPIPE, but in fact this can also introduce platform 
dependence, I'm not sure all IP stacks behave identically.

The other option is to add reading the socket to the poll loop like it is done 
in 1.9.x (can be done using io_read_write instead of io_write, and drop the 
connection from the read callback when the socket is readable and read() 
returns 0 bytes.)

-- 
Bazsi