Re: [syslog-ng] runaway programs started by syslog-ng
A followup... On a test server, I have noticed that when the system reboots there is one instance of syslog-ng but two instances of the program being executed. There should only be one. If I do a /etc/init.d/syslog-ng restart, then I end up with just one instance of the program as expected, and the system appears to be fine. Any ideas?
On Fri, 2006-10-27 at 17:40 -0700, Dev Man wrote:
A followup... On a test server, I have noticed that when the system reboots there is one instance of syslog-ng but two instances of the program being executed. There should only be one.
If I do a /etc/init.d/syslog-ng restart, then I end up with just one instance of the program as expected, and the system appears to be fine.
syslog-ng should restart the program when it receives a SIGCHLD signal which indicates that the child exited. Hmm.. I have reread that part of the code and in reality it restarts child processes in two cases: - when SIGCHLD is received - when the pipe to the process returns EPIPE The two excludes each other by the means of a variable, which could be inherently racy (a signal handler racing with conventional code) however this is not the case as the SIGCHLD handler is not called from inside the signal handler, its execution is deferred until the next iteration of the main loop. (see main_loop_run() function, and the processing of the sig_child_received variable) Therefore the running of the EPIPE handler and SIGCHLD is sequential in nature, so if an EPIPE is received first, the program will be restarted and the SIGCHLD handler will see this... Hmmm wait a minute, the sigchild handler does not check whether the terminated process is the same as the currently running one, so here's a possible bug scenario: - EPIPE is received, program is restarted, its pid is recorded to self->pid - SIGCHLD is received, the signal handler meets with pid != -1, thus it restarts the process without killing the previous one (as it should be dead already) This could explain why you have two instances running on startup, and the sequence of EPIPE/SIGCHLD determines whether a process is leaked. I may also explain the large number of processes you have, provided the program frequently exits, syslog-ng might start a leaked process for each exit, which at the end results in a lot of instances running. I have committed the patch below (with some other minor fixes), which should fix the problem: --- orig/src/afprog.c +++ mod/src/afprog.c @@ -50,8 +50,9 @@ afprogram_dd_exit(pid_t pid, int status, AFProgramDestDriver *self = (AFProgramDestDriver *) s; /* Note: self->pid being -1 means that deinit was called, thus we don't - * need to restart the command */ - if (self->pid != -1) + * need to restart the command. self->pid might change due to EPIPE + * handling restarting the command before this handler is run. */ + if (self->pid != -1 && self->pid == pid) { msg_verbose("Child program exited, restarting", evt_tag_str("cmdline", self->cmdline->str), @@ -105,6 +106,7 @@ afprogram_dd_init(LogPipe *s, GlobalConf dup2(devnull, 1); dup2(devnull, 2); close(devnull); + close(msg_pipe[0]); close(msg_pipe[1]); execl("/bin/sh", "/bin/sh", "-c", self->cmdline->str, NULL); _exit(127); @@ -115,7 +117,6 @@ afprogram_dd_init(LogPipe *s, GlobalConf child_manager_register(self->pid, afprogram_dd_exit, log_pipe_ref(&self->super.super), (GDestroyNotify) log_pipe_unref); - g_fd_set_cloexec(msg_pipe[1], TRUE); close(msg_pipe[0]); if (!self->writer) self->writer = log_writer_new(LW_FORMAT_FILE, s, &self->writer_options); --- orig/src/gsockaddr.c +++ mod/src/gsockaddr.c @@ -698,7 +698,7 @@ g_sockaddr_unix_format(GSockAddr *addr, GSockAddrUnix *unix_addr = (GSockAddrUnix *) addr; g_snprintf(text, n, "AF_UNIX(%s)", - unix_addr->saun.sun_path[0] ? unix_addr->saun.sun_path + unix_addr->salen > sizeof(unix_addr->saun.sun_family) && unix_addr->saun.sun_path[0] ? unix_addr->saun.sun_path : "anonymous"); return text; } --- orig/src/logwriter.c +++ mod/src/logwriter.c @@ -266,9 +266,12 @@ log_writer_format_log(LogWriter *self, L static void log_writer_broken(LogWriter *self, gint notify_code) { - /* the connection seems to be broken */ - log_pipe_notify(self->control, &self->super, notify_code, self); + /* the order of these calls is important, as log_pipe_notify() will handle + * reinitialization, and if deinit is called last, the writer might be + * left in an unpolled state */ + log_pipe_deinit(&self->super, NULL, NULL); + log_pipe_notify(self->control, &self->super, notify_code, self); } static gboolean -- Bazsi
I've got 3 other machines that exhibit this issue as well. It seems that it's the order that services are started up in affects it ... haven't pinned down the right combo to make it do this each time. Wyatt -----Original Message----- From: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] On Behalf Of Dev Man Sent: Friday, October 27, 2006 8:40 PM To: syslog-ng@lists.balabit.hu Subject: Re: [syslog-ng] runaway programs started by syslog-ng A followup... On a test server, I have noticed that when the system reboots there is one instance of syslog-ng but two instances of the program being executed. There should only be one. If I do a /etc/init.d/syslog-ng restart, then I end up with just one instance of the program as expected, and the system appears to be fine. Any ideas? _______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
participants (3)
-
Balazs Scheidler
-
Dev Man
-
Wyatt Neal