On Sun, 2009-07-05 at 01:20 +0200, Sandor Geller wrote:
Hi,
When lsof shows something as (deleted) that doesn't mean that the app which keeps the fd open knows that the file/socket/whatever has been deleted, quite the contrary. If the app which listens on the socket is still running then nothing happens, syslog-ng forwards the logs and the app receives them. If the listening app exits then syslog-ng detects it and gives messages like this:
EOF occurred while idle; fd='17' Connection broken; time_reopen='120'
So my guess is that you simply deleted/recreated the socket, but didn't terminate the listener.
Ah, I've just realised that your app isn't listening on a socket at all. You're using a named pipe instead. This makes things somewhat more complicated because pipes are handled/backed up by the kernel. Syslog-ng opens pipes in RW mode, so on linux it can read/write to pipes even when there is nothing writing/reading on the other end. The kernel sends SIGPIPE in this case but syslog-ng ignores SIGPIPE. This means that syslog-ng doesn't detect problems with pipes.
The assumption that syslog-ng uses regarding named pipes is as follows: 1) the named pipe can be created either by syslog-ng, or the application reading it 2) the named pipe exists continously, e.g. it does not get created 3) it does not matter if the reader gets stopped, the contents of the named pipe remain in kernel memory (until the reader is restarted) 4) if the pipe buffer becomes full in the kernel, syslog-ng will either enable flow-control or start dropping messages. So the basic model is that you create the named pipe once, and your application never removes it, it just opens it in O_RDONLY mode. Recent versions of syslog-ng (in 3.0) can even create named pipes on their own without having to do that before starting syslog-ng.
If the pipe gets removed and the reading side vanishes than the pipe destination has just became a sinkhole.
Of course only after the kernel buffer fills up. Basically you only have time to restart your reader, if the traffic is high, and the reader is not present, then syslog-ng may start flow-controlling/dropping messages.
time_reap could help when there is no activity, however in syslog-ng 2.0 and 2.1 time_reap works only for real files, not for named pipes. The cause of this is that cfg-grammar.l sets the AFFILE_NO_EXPAND flag for pipes:
dest_afpipe_params : string { last_driver = affile_dd_new($1, AFFILE_NO_EXPAND | AFFILE_PIPE); free($1); last_writer_options = &((AFFileDestDriver *) last_driver)->writer_options; last_writer_options->flush_lines = 0; } dest_afpipe_options { $$ = last_driver; } ;
and affile.c contains a conditional in the reap setup code:
if ((self->flags & AFFILE_NO_EXPAND) == 0) { self->reap_timer = g_timeout_add_full(G_PRIORITY_DEFAULT, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL); self->writer_hash = persist_config_fetch(persist, affile_dd_format_persist_name(self)); if (self->writer_hash) g_hash_table_foreach(self->writer_hash, affile_dd_reuse_writer, self); } else { self->writer = persist_config_fetch(persist, affile_dd_format_persist_name(self)); if (self->writer) { affile_dw_set_owner(self->writer, self); log_pipe_init(&self->writer->super, NULL, NULL); } }
So although I think pipes should get reaped as well, this won't happen. BTW this has been fixed in syslog-ng 3: the AFFILE_NO_EXPAND flag has been removed from the pipe setup code to allow using pipe names containing macros. A side-effect of this change is that pipes could get reaped as well.
in syslog-ng 2.x, pipes didn't need reaping, as they couldn't contain macro names.
So, for now I'd recommend using a socket instead of a named pipe or send a HUP signal to syslog-ng after the pipe was changed.
unix domain sockets work similar to tcp() destinations, so if that's the semantics that you want, please use those instead of pipe(). -- Bazsi