[syslog-ng]syslog-ng hanging bringing machine in trouble

Roberto Nibali ratz@tac.ch
Wed, 12 Feb 2003 13:29:26 +0100


>>I would say no but I'm not sure here, I would also suspect it depends on 
>>the version of cron deployed on your machine.
> 
> it is. the syslog() function in libc clearly sucks. If the log connection is
> broken (e.g. you restarted syslog-ng) it does not immediately reopen
> the connection only _after_ the first error occurs.

I've not observed this behavior with glibc-2.1.3 yet (that's what I'm using for 
my distribution). Might it be that newer glibc's have this problem? I've made a 
diff of ../misc/syslog.c between glibc-2.1.3 and glibc-2.2.5 and among 
beautifying code there was this chunk:

@@ -216,17 +239,29 @@

         if (!connected || __send(LogFile, buf, bufsize, 0) < 0)
           {
-           closelog_internal ();       /* attempt re-open next time */
-           /*
-            * Output the message to the console; don't worry about blocking,
-            * if console blocks everything will.  Make sure the error reported
-            * is the one from the syslogd failure.
-            */
-           if (LogStat & LOG_CONS &&
-               (fd = __open(_PATH_CONSOLE, O_WRONLY|O_NOCTTY, 0)) >= 0)
+           if (connected)
               {
-               dprintf (fd, "%s\r\n", buf + msgoff);
-               (void)__close(fd);
+               /* Try to reopen the syslog connection.  Maybe it went
+                  down.  */
+               closelog_internal ();
+               openlog_internal(LogTag, LogStat | LOG_NDELAY, 0);
+             }
+
+           if (!connected || __send(LogFile, buf, bufsize, 0) < 0)
+             {
+               closelog_internal ();   /* attempt re-open next time */
+               /*
+                * Output the message to the console; don't worry
+                * about blocking, if console blocks everything will.
+                * Make sure the error reported is the one from the
+                * syslogd failure.
+                */
+               if (LogStat & LOG_CONS &&
+                   (fd = __open(_PATH_CONSOLE, O_WRONLY|O_NOCTTY, 0)) >= 0)
+                 {
+                   dprintf (fd, "%s\r\n", buf + msgoff);
+                   (void)__close(fd);
+                 }
               }
           }

So one would assume that it used to work for a re-open, wouldn't he?

> Thus a message is dropped after a log connection breaks.

Sounds plausible, thanks for the pointer.

> It becomes worse when a parent process opens the log connection, and each
> child inherits the fd (this is what cron probably does), as the first log
> message is dropped in every child. This becomes the worst when the child
> sends only a single log message, when the program becomes absolutely silent.

I see. Thanks for the information. Have you heard or seen any different 
implementation of syslog() which would not have this deficiency?

Best regards,
Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc