Re: [syslog-ng]preparing syslog-ng 1.6.6

3 Feb 2005

      On Thu, 2005-02-03 at 11:31 +0100, Roberto Nibali wrote:
...
...
The problem is that syslog-ng polls the destination TCP sockets for
writing only,
That's the issue with poll(). Dumb question: Why not using select()?
the same, in the kernel internally both poll and select use the same
interfaces, e.g. the same rules apply how they will indicate
readability.
...
...
and whenever the remote endpoint closes the TCP
connection, it is not indicated in any way (as closing a socket triggers
readability and not writability). Whenever a message is to be written to
this socket, the first write() syscall succeeds, and only the next
write() will return EPIPE, so syslog-ng is able to detect the broken
connection.
Also a signal SIGPIPE is invoked. Again, stupid question: Couldn't you 
use SIGPIPE to inject the write request off the remaining buffer?
I think SIGPIPE is issued at the same time write() returns EPIPE, so I
think it also happens at the second write, and the kernel already
acknowledged the previous message.

I was curious whether this was true, but even when I disabled SIG_IGNing
SIGPIPE, it did not occur for some reason. tcp(7) states that SIGPIPEs
are only triggered for SO_KEEPALIVE-d sockets, I enabled SO_KEEPALIVE
still no SIGPIPEs.

Anyway I don't think SIGPIPE would help us here. The real problem is
that the kernel returns success for the write() system call, while the
connection was already broken. I hackish solution would be to buffer the
last line written, and in case of failure push it back to the FIFO
queue. This is ugly but could work.
...
The 
way I see syslog-ng functionally working (extremely simplified, please 
correct), is:
o syslog-ng polls on read_fds for incoming syslog messages
o syslog-ng maintains a queue or linked list of messages where the newly 
arrived messages get queued up for delivery. This queue is also used in 
case a destination is down and needs to be reprobed (reopened) for a 
connection.
o syslog-ng polls on write_fds for outgoing possibilities and if 
success, sends out in FIFO the queued messages.
yes, it is more or less correct, though the same loop is used for
read/write polling.
...
o TCP close -> eof reaches the socket and gets passed up to syslog-ng 
which has already sent (write()) one line _but_ not yet lost the buffer 
it has written.
o A new write will return EPIPE and a SIGPIPE signal.
The idea is to either pass the EPIPE back to the caller function sending 
the syslog message chunk or to invoke a signal handler that signals the 
caller to resend that message again. Or use select() to poll for 
readability? Or create a thread within the calling stack (same process 
with access to the write buffer) of the poll function and have it wait 
on a condition variable which is set upon EPIPE. The thread waits in 
pthread_cond_wait() and will write the last successfully written buffer 
again.
Again the solution I outlined above might be ok, though the fact that
syslog-ng might coalesce outgoing TCP writes it is not very simple.

[snip]
...
Could you point me to the code in question in 1.6.x so I could check it 
out for myself, please?
it is implemented in libol/src/pkt_buffer.c, packet buffers can operate
in two modes packet and stream mode. packet mode makes the output
routines write a single message at a time, stream mode enables write
coalescing. These two modes have two independent flush functions:

static int do_flush_stream(struct abstract_buffer *c, struct abstract_write *w)
static int do_flush_pkt(struct abstract_buffer *c, struct abstract_write *w)

Solving the first would mean to save the last coalesced buffer and push it back 
to the buffer in case of EPIPE, but in fact this can also introduce platform 
dependence, I'm not sure all IP stacks behave identically.

The other option is to add reading the socket to the poll loop like it is done 
in 1.9.x (can be done using io_read_write instead of io_write, and drop the 
connection from the read callback when the socket is readable and read() 
returns 0 bytes.)

-- 
Bazsi

Re: [syslog-ng]preparing syslog-ng 1.6.6

Balazs Scheidler