Hi,
I know about this problem but I'm afraid it is very difficult to solve in the 1.6.x branch, and should be solved by the new development branch.
Maybe there is a stupid workaround which is not difficult. Just spitting out some ideas, as I clearly haven't read the code in question (to be honest, I couldn't find it).
The problem is that syslog-ng polls the destination TCP sockets for writing only,
That's the issue with poll(). Dumb question: Why not using select()?
and whenever the remote endpoint closes the TCP connection, it is not indicated in any way (as closing a socket triggers readability and not writability). Whenever a message is to be written to this socket, the first write() syscall succeeds, and only the next write() will return EPIPE, so syslog-ng is able to detect the broken connection.
Also a signal SIGPIPE is invoked. Again, stupid question: Couldn't you use SIGPIPE to inject the write request off the remaining buffer? The way I see syslog-ng functionally working (extremely simplified, please correct), is: o syslog-ng polls on read_fds for incoming syslog messages o syslog-ng maintains a queue or linked list of messages where the newly arrived messages get queued up for delivery. This queue is also used in case a destination is down and needs to be reprobed (reopened) for a connection. o syslog-ng polls on write_fds for outgoing possibilities and if success, sends out in FIFO the queued messages. o TCP close -> eof reaches the socket and gets passed up to syslog-ng which has already sent (write()) one line _but_ not yet lost the buffer it has written. o A new write will return EPIPE and a SIGPIPE signal. The idea is to either pass the EPIPE back to the caller function sending the syslog message chunk or to invoke a signal handler that signals the caller to resend that message again. Or use select() to poll for readability? Or create a thread within the calling stack (same process with access to the write buffer) of the poll function and have it wait on a condition variable which is set upon EPIPE. The thread waits in pthread_cond_wait() and will write the last successfully written buffer again.
The message aa4 is sent still on the old connection: read(5, "<5>Feb 3 09:42:01 bazsi: aa4\0", 2048) = 30 time(NULL) = 1107420121 time(NULL) = 1107420121 time(NULL) = 1107420121 poll([{fd=5, events=POLLIN, revents=POLLIN|POLLHUP}, {fd=4, events=POLLOUT, revents=POLLOUT}, {fd=3, events=POLLIN}], 3, 100) = 2 read(5, "", 2048) = 0 write(2, "Read EOF on fd 5.\n", 18) = 18 write(2, "Marking fd 5 for closing.\n", 26) = 26 write(4, "<5>Feb 3 09:42:01 src@bzorp bazsi: aa4\n", 40) = 40 time(NULL) = 1107420121 poll([{fd=5, events=POLLIN, revents=POLLIN|POLLHUP}, {fd=4, events=0}, {fd=3, events=POLLIN}], 3, 100) = 1 time(NULL) = 1107420121
Then I terminated netcat, and sent another message: read(5, "<5>Feb 3 09:42:05 bazsi: aa5\0", 2048) = 30 time(NULL) = 1107420125 time(NULL) = 1107420125 time(NULL) = 1107420125 poll([{fd=5, events=POLLIN, revents=POLLIN|POLLHUP}, {fd=4, events=POLLOUT, revents=POLLOUT}, {fd=3, events=POLLIN}], 3, 100) = 2 read(5, "", 2048) = 0 write(2, "Read EOF on fd 5.\n", 18) = 18 write(2, "Marking fd 5 for closing.\n", 26) = 26 write(4, "<5>Feb 3 09:42:05 src@bzorp bazsi: aa5\n", 40) = 40
I see.
time(NULL) = 1107420125 poll([{fd=5, events=POLLIN, revents=POLLIN|POLLHUP}, {fd=4, events=0, revents=POLLERR|POLLHUP}, {fd=3, events=POLLIN}], 3, 100) = 2
Even though the connection was broken the write() syscall returned success, and only the next poll() call indicates some problems.
Yes, because the revents in the pollfd struct indicates events that occured. Kinda looking into the past ;).
time(NULL) = 1107420125 poll([{fd=5, events=POLLIN, revents=POLLIN|POLLHUP}, {fd=4, events=0, revents=POLLERR|POLLHUP}, {fd=3, events=POLLIN}], 3, 100) = 2 write(2, "io_iter(): POLLHUP on inactive fd!\n", 35) = 35 write(2, "Marking fd 4 for closing.\n", 26) = 26 time(NULL) = 1107420125 write(2, "Closing fd 5.\n", 14) = 14 close(5) = 0 write(2, "Connection broken to AF_INET(127.0.0.1:2000), reopening in 1 seconds\n", 69) = 69 time(NULL) = 1107420125 write(2, "Closing fd 4.\n", 14) = 14 close(4) = 0
The current solution in version 1.9 is to poll the fd for reading as well as writing, so we get immediate notification when the connection is broken. The same problem could still occur, though only within a very small timeframe.
Could you point me to the code in question in 1.6.x so I could check it out for myself, please? Thanks for your patience and excellent work, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc