[syslog-ng] syslog-ng deadlock if /dev/console locks?

Balazs Scheidler bazsi at balabit.hu
Sat Feb 5 16:19:29 CET 2011


On Wed, 2011-01-26 at 17:03 +0100, Sandor Geller wrote:
> Hello,
> 
> On Wed, Jan 26, 2011 at 4:12 PM, Paul Krizak <paul.krizak at amd.com> wrote:
> > Hi, we're using syslog-ng 3.1.2 and have run into what appears to be a
> > bug, but I'd like to get the community's opinion before we dig further
> > into it.
> >
> > We have a bunch of HP servers with iLO2 and iLO3 devices, configured
> > with their virtual serial ports on COM1 (ttyS0).  We subsequently have
> > the OS (RHEL4, RHEL5) configured to use COM1 as its console (e.g.
> > /dev/console).  This is a very standard configuration that allows us to
> > get remote access to the machines without having to purchase the iLO
> > Advanced KVM feature.  It also lets us use the Magic SysRq keys to probe
> > dead systems and stuff, so in general it's not something we're keen to
> > change.
> >
> > What we have found, however, is that there are some cases where the iLO
> > will freeze and requires a reboot.  When the iLO reboots, however, the
> > kernel's connection to /dev/console (through the virtual serial port)
> > hangs and blocks.  Any traffic to /dev/console just sits in the kernel's
> > buffer and is never delivered.  Once the buffer is full, the kernel
> > simply blocks on any write to /dev/console.
> >
> > Now this is a Bad Thing in general, and we're working with HP to try and
> > remedy this bug.  However, what concerns me is that syslog-ng, when
> > faced with this behavior, also blocks, even for log messages not bound
> > for /dev/console.
> 
> syslog-ng uses a single thread (with the exception of database
> destinations) running the event loop so when a read() or a write()
> blocks then it affects the whole log processing

> 
> > What we have observed is that a system with syslog-ng will keep
> > delivering the occasional console message to /dev/console (ex. *.emerg
> > messages) and meanwhile the file-based log paths keep working.  But once
> > /dev/console blocks, the next time a console message is delivered, *all*
> > of syslog-ng blocks waiting for that message to be delivered, and all of
> > the file-based paths block as well.  The result is that pretty much
> > everything on the system stops working.  For example, you can't log in,
> > even as root, because the login process blocks on the syslog command
> > that writes to /var/log/secure.  Anything that uses syslog suddenly blocks.
> >
> > Is this expected behavior?  I would think that syslog-ng would be able
> > to continue accepting and delivering messages, even if one of the log
> > paths is stalled on a blocked write.
> 
> syslog-ng uses non-blocking I/O for all sources / destinations but
> despite of this the kernel could still block it therefore syslog-ng
> protects reads/writes in logtransport.c with alarm() so it should
> recover when timeout is set and a read/write blocked. For me it looks
> like the timeout is not set in all cases, only file and program
> sources initialise transport->timeout to 10 secs so I'd say this isn't
> expected behaviour - it is a bug.

that alarm stuff got implemented because of /proc/kmsg, which - because
of a kernel bug - doesn't support non-blocking I/O properly.

The file source driver (usually used for /proc/kmsg) sets that, even
though the kernel should never block in that case.

So I wouldn't call this a bug, the alarm is a workaround for a specific
case and /dev/console is different.

The culprit seems to be that indeed file() destinations always assumes
that files are always writable, which is only true for regular files,
but not for devices. So what needs to be done is to apply regular
polling if the file is non-regular.

What about this patch (untested):

diff --git a/src/affile.c b/src/affile.c
index b5e1bef..24e5986 100644
--- a/src/affile.c
+++ b/src/affile.c
@@ -42,7 +42,7 @@ static gboolean
 affile_open_file(gchar *name, gint flags,
                  gint uid, gint gid, gint mode,
                  gint dir_uid, gint dir_gid, gint dir_mode,
-                 gboolean create_dirs, gboolean privileged, gboolean is_pipe, gint *fd)
+                 gboolean create_dirs, gboolean privileged, gboolean is_pipe, gboolean *regular, gint *fd)
 {
   cap_t saved_caps;
   struct stat st;
@@ -79,7 +79,11 @@ affile_open_file(gchar *name, gint flags,
                       evt_tag_str("filename", name),
                       NULL);
         }
+      if (regular)
+        *regular = !!S_ISREG(st.st_mode);
     }
+  else if (regular)
+    *regular = TRUE;
   *fd = open(name, flags, mode < 0 ? 0600 : mode);
   if (is_pipe && *fd < 0 && errno == ENOENT)
     {
@@ -119,7 +123,7 @@ affile_sd_open_file(AFFileSourceDriver *self, gchar *name, gint *fd)
   else
     flags = O_RDONLY | O_NOCTTY | O_NONBLOCK | O_LARGEFILE;
 
-  if (affile_open_file(name, flags, -1, -1, -1, 0, 0, 0, 0, !!(self->flags & AFFILE_PRIVILEGED), !!(self->flags & AFFILE_PIPE), fd))
+  if (affile_open_file(name, flags, -1, -1, -1, 0, 0, 0, 0, !!(self->flags & AFFILE_PRIVILEGED), !!(self->flags & AFFILE_PIPE), NULL, fd))
     return TRUE;
   return FALSE;
 }
@@ -442,6 +446,7 @@ affile_dw_init(LogPipe *s)
   int fd, flags;
   struct stat st;
   GlobalConfig *cfg = log_pipe_get_config(s);
+  gboolean regular;
 
   if (cfg)
     self->time_reopen = cfg->time_reopen;
@@ -452,7 +457,7 @@ affile_dw_init(LogPipe *s)
               NULL);
               
   if (self->owner->overwrite_if_older > 0 && 
-      stat(self->filename->str, &st) == 0 && 
+      stat(self->filename->str, &st) == 0 &&
       st.st_mtime < time(NULL) - self->owner->overwrite_if_older)
     {
       msg_info("Destination file is older than overwrite_if_older(), overwriting",
@@ -471,13 +476,13 @@ affile_dw_init(LogPipe *s)
   if (affile_open_file(self->filename->str, flags, 
                        self->owner->file_uid, self->owner->file_gid, self->owner->file_perm, 
                        self->owner->dir_uid, self->owner->dir_gid, self->owner->dir_perm, 
-                       !!(self->owner->flags & AFFILE_CREATE_DIRS), FALSE, !!(self->owner->flags & AFFILE_PIPE), &fd))
+                       !!(self->owner->flags & AFFILE_CREATE_DIRS), FALSE, !!(self->owner->flags & AFFILE_PIPE), &regular, &fd))
     {
       guint write_flags;
       
       if (!self->writer)
         {
-          self->writer = log_writer_new(LW_FORMAT_FILE | ((self->owner->flags & AFFILE_PIPE) ? 0 : LW_ALWAYS_WRITABLE));
+          self->writer = log_writer_new(LW_FORMAT_FILE | ((self->owner->flags & AFFILE_PIPE || !regular) ? 0 : LW_ALWAYS_WRITABLE));
           log_writer_set_options((LogWriter *) self->writer, s, &self->owner->writer_options, 1, self->owner->flags & AFFILE_PIPE ? SCS_PIPE : SCS_FILE, self->owner->super.id, self->filename->str);
           log_pipe_append(&self->super, self->writer);
         }


-- 
Bazsi




More information about the syslog-ng mailing list