[syslog-ng] missing logs when SIGHUP

Balazs Scheidler bazsi at balabit.hu
Sat Dec 1 17:15:04 CET 2007


On Fri, 2007-11-30 at 14:34 +0100, Martin Ervius wrote:
> Hi,
> 
> It seems that sometimes when syslog-ng receives a SIGHUP (like for
> example after logrotating or config change) it loses track of the logs
> that has been received, but are not yet written to disk. Except for
> this problem, syslog-ng can receive ~35000 logs/sec on the same
> machine without any loss at all. 
> 
> I have been able to reproduce the situation in a test environment on
> both FreeBSD 6.1 (Opteron) with syslog-ng 2.0.3 and on Ubuntu Linux
> 7.10 (i386) with syslog-ng 2.0.5. On linux, it appears that the number
> of logs lost equals the size of the log_fetch_limit parameter, because
> the number of lost logs are in multiples of this value. 
> 
> To see what happens I used strace to follow syslog-ng and dump read
> and write calls while sending numbered logs at a rate of 1000/sec.
> Only one machine was sending logs and they were transferred from a
> different machine over tcp. During this time I sent a SIGHUP to the
> receiving syslog with two seconds apart (since it doesn't occur with
> every SIGHUP). log_fetch_limit is set to 100, log_fifo_size to 51200
> and log_msg_size is 2048. Flow-control is not activated since it
> didn't appear to make any difference in this particular case. 
> 
> 
> This is the output of 'strace -s 65536 -e write,read' with the
> irrelevant parts omitted (I captured a similar output on FreeBSD with
> ktrace, but is not included in this mail)
> 
> ## read from tcp socket (only beginning and end to save space) 
> read(5, "<135>Nov 27 18:36:34 tiger logger: hejsan 9388\n<135>Nov 27
> 18:36:34 tiger logger: hejsan 9389\n<135>Nov 27 18:36:34 tiger logger:
> hejsan 9390\n<135>Nov 27 18:36:34 tiger logger: hejsan 9391\n<135>Nov
> 27 18:36:34 tiger logger: hejsan 9392\n<135>Nov 27 18:36:34 tiger
> logger: hejsan 9393\n<135>Nov 27 18:36:34 tiger logger: hejsan 9394
> \n<135>Nov 27 18:36:34 tiger logger: hejsan 9395\n<135>Nov 27 18:36:34
> tiger logger: hejsan 9396\n<135>Nov 27 18:36:34 tiger logger: hejsan
> 9397\n<135>Nov 27 18:36:34 tiger logger: hejsan 9398\n<135>Nov 27
> 18:36:34 tiger logger: hejsan 9399\n<135>Nov 27 18:36:34 tiger logger:
> hejsan 9400\n<135>Nov 27 18:36:34 tiger logger: hejsan 9401\n<135>Nov
> 27 18:36:34 tiger logger: hejsan 9402\n<135>Nov 27 18:36:34 tiger
> logger: hejsan 9403\n<135>Nov 27 18:36:34 tiger logger: hejsan 9404
> \n<135>Nov 27 18:36:34 tiger logger: hejsan 9405\n<135>Nov 27 18:36:34
> tiger logger: hejsan 9406\n<135>Nov 27 18:36:34 tiger logger: hejsan
> 9407\n<135>Nov 27 18:36:34 tiger logger: hejsan 9408\n<135>Nov 27
> 18:36:34 tiger logger: hejsan 9409\n<135>Nov 27 18:36:34 tiger logger:
> hejsan 9410\n 
> . . .
> <135>Nov 27 18:36:34 tiger logger: hejsan 9504\n<135>Nov 27 18:36:34
> tiger logger: hejsan 9505\n<135>Nov 27 18:36:3
> 4 tiger logger: hejsan 9506\n<135>Nov 27 18:36:34 tiger logger: hejsan
> 9507\n<135>Nov 27 18:36:34 tiger logger: hej 
> ", 8192) = 5678
> 
> ## then follows all the writes until a SIGHUP is received
> ...
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9401\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9402\n", 42) = 42 
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9403\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9404\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9405\n", 42) = 42 
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9406\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9407\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9408\n", 42) = 42 
> --- SIGHUP (Hangup) @ 0 (0) ---
> 
> 
> ## After the config file is re-read, syslog-ng continues to read new
> logs from the socket, with unwritten logs from 9409 to 9507 in the
> buffer (9508 is partly read before the SIGHUP) 
> read(5, "san 9508\n<135>Nov 27 18:36:34 tiger logger: hejsan 9509
> \n<135>Nov 27 18:36:34 tiger logger: hejsan 9510\n<135>Nov 27 18:36:34
> tiger logger: hejsan 9511\n<135>Nov 27 18:36:34 tiger logger: hejsan
> 9512\n<135>Nov 27 18:36:34 tiger logger: hejsan 9513\n<135>Nov 27
> 18:36:34 tiger logger: hejsan 9514\n<135>Nov 27 18:36:34 tiger logger:
> hejsan 9515\n.... 
> 
> 
> ## And finally new writes are done, but starting at 9508. The logs
> ranging from 9409 to 9507 are now lost and are consequently missing in
> the destination file.
> write(4, "Nov 27 18:36:34 tiger logger: hejsan 9508\n", 42) = 42 
> write(4, "Nov 27 18:36:34 tiger logger: hejsan 9509\n", 42) = 42
> write(4, "Nov 27 18:36:34 tiger logger: hejsan 9510\n", 42) = 42
> write(4, "Nov 27 18:36:34 tiger logger: hejsan 9511\n", 42) = 42 
> 
> -----------------------------------------------------------------------------------------
> 
> 
> I have experimented with different parameter settings, but am still
> losing logs at SIGHUP. I hope this information is enough if someone is
> able to diagnose the issue. If not, I can send my particular config
> etc on request. Any input on this is appreciated, has anyone else
> noticed this behaviour or is it maybe a configuration issue with this
> setup? 

Thanks for the detailed explanation. I indeed forgot to save and restore
some data buffers accross reloads. The patch is risky somewhat, so I did
not apply it to the mainline yet.

I performed some limited testing on my development laptop, I've made
syslog-ng delay output on destination files, sent a couple of messages
to it, sent a SIGHUP, some more messages, and it seems to write all
buffered messages to the configured destination nicely. I've also
checked for two cases of possible memory leaks, and syslog-ng freed what
it had to free.

I'd appreciate some additional testing. If there's an ACK from you, I
apply the patch to mainline, where it can sit till the next release.

The patch is attached to this mail, but I've also pushed it to
git.balabit.hu, "remember-dw-file" branch.


diff --git a/src/affile.c b/src/affile.c
index b79074e..d1a9ab7 100644
--- a/src/affile.c
+++ b/src/affile.c
@@ -163,8 +163,6 @@ affile_sd_notify(LogPipe *s, LogPipe *sender, gint notify_code, gpointer user_da
     }
 }
 
-
-
 static gboolean
 affile_sd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
 {
@@ -268,7 +266,7 @@ affile_sd_new(gchar *filename, guint32 flags)
 }
 
 
-typedef struct _AFFileDestWriter
+struct _AFFileDestWriter
 {
   LogPipe super;
   AFFileDestDriver *owner;
@@ -277,7 +275,13 @@ typedef struct _AFFileDestWriter
   time_t last_msg_stamp;
   time_t last_open_stamp;
   time_t time_reopen;
-} AFFileDestWriter;
+};
+
+static gboolean
+affile_dw_reapable(AFFileDestWriter *self)
+{
+  return ((LogWriter *) self->writer)->queue->length == 0;
+}
 
 static gboolean
 affile_dw_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
@@ -372,6 +376,18 @@ affile_dw_queue(LogPipe *s, LogMessage *lm, gint path_flags)
 }
 
 static void
+affile_dw_set_owner(AFFileDestWriter *self, AFFileDestDriver *owner)
+{
+  if (self->owner)
+    log_pipe_unref(&self->owner->super.super);
+  log_pipe_ref(&owner->super.super);
+  self->owner = owner;
+  if (self->writer)
+    log_writer_set_options((LogWriter *) self->writer, &owner->writer_options);
+  
+}
+
+static void
 affile_dw_free(LogPipe *s)
 {
   AFFileDestWriter *self = (AFFileDestWriter *) s;
@@ -382,7 +398,7 @@ affile_dw_free(LogPipe *s)
   g_free(s);
 }
 
-static LogPipe *
+static AFFileDestWriter *
 affile_dw_new(AFFileDestDriver *owner, GString *filename)
 {
   AFFileDestWriter *self = g_new0(AFFileDestWriter, 1);
@@ -400,7 +416,7 @@ affile_dw_new(AFFileDestDriver *owner, GString *filename)
   /* we have to take care about freeing filename later. 
      This avoids a move of the filename. */
   self->filename = filename;
-  return &self->super;
+  return self;
 }
 
 void 
@@ -494,6 +510,16 @@ affile_dd_set_overwrite_if_older(LogDriver *s, gint overwrite_if_older)
   self->overwrite_if_older = overwrite_if_older;
 }
 
+static inline gchar *
+affile_dd_format_persist_name(AFFileDestDriver *self)
+{
+  static gchar persist_name[1024];
+  
+  g_snprintf(persist_name, sizeof(persist_name), "affile_dd_writers(%s)", self->filename_template->template->str);
+  return persist_name;
+}
+
+
 static const gchar *
 affile_dd_format_stats_name(AFFileDestDriver *self, gboolean is_pipe)
 {
@@ -513,7 +539,7 @@ affile_dd_reap_writers(gpointer key, gpointer value, gpointer user_data)
   AFFileDestDriver *self = (AFFileDestDriver *) user_data;
   AFFileDestWriter *dw = (AFFileDestWriter *) value;
   
-  if ((reap_now - dw->last_msg_stamp) >= self->time_reap)
+  if ((reap_now - dw->last_msg_stamp) >= self->time_reap && affile_dw_reapable(dw))
     {
       msg_verbose("Destination timed out, reaping", 
                   evt_tag_str("template", self->filename_template->template->str),
@@ -542,6 +568,25 @@ affile_dd_reap(gpointer s)
   return TRUE;
 }
 
+/**
+ * affile_dd_reuse_writer:
+ *
+ * This function is called as a g_hash_table_foreach() callback to set the
+ * owner of each writer, previously connected to an AFileDestDriver instance
+ * in an earlier configuration. This way AFFileDestWriter instances are
+ * remembered accross reloads.
+ * 
+ **/
+static void
+affile_dd_reuse_writer(gpointer key, gpointer value, gpointer user_data)
+{
+  AFFileDestDriver *self = (AFFileDestDriver *) user_data;
+  AFFileDestWriter *writer = (AFFileDestWriter *) value;
+  
+  affile_dw_set_owner(writer, self);
+}
+
+
 static gboolean
 affile_dd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
 {
@@ -576,30 +621,79 @@ affile_dd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
   if ((self->flags & AFFILE_NO_EXPAND) == 0)
     {
       self->reap_timer = g_timeout_add_full(G_PRIORITY_LOW, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL);
+      self->writer_hash = persist_config_fetch(persist, affile_dd_format_persist_name(self));
+      if (self->writer_hash)
+        g_hash_table_foreach(self->writer_hash, affile_dd_reuse_writer, self);
+    }
+  else
+    {
+      self->writer = persist_config_fetch(persist, affile_dd_format_persist_name(self));
+      affile_dw_set_owner(self->writer, self);
     }
+  
+  
   return TRUE;
 }
 
+
+/**
+ * This is registered as a destroy-notify callback for an AFFileDestWriter
+ * instance. It destructs and frees the writer instance.
+ **/
+static void
+affile_dd_destroy_writer(gpointer value)
+{
+  AFFileDestWriter *writer = (AFFileDestWriter *) value;
+  log_pipe_deinit(&writer->super, NULL, NULL);
+  log_pipe_unref(&writer->super);
+}
+
+/*
+ * This function is called as a g_hash_table_foreach_remove() callback to
+ * free the specific AFFileDestWriter instance in the hashtable.
+ */
 static gboolean
-affile_dd_remove_writers(gpointer key, gpointer value, gpointer user_data)
+affile_dd_destroy_writer_hr(gpointer key, gpointer value, gpointer user_data)
 {
-  AFFileDestDriver *self = (AFFileDestDriver *) user_data;
-  LogPipe *writer = (LogPipe *) value;
-  
-  log_pipe_deinit(writer, self->cfg, NULL);
-  log_pipe_unref(writer);
+  affile_dd_destroy_writer(value);
   return TRUE;
 }
 
+/**
+ * affile_dd_destroy_writer_hash:
+ * @value: GHashTable instance passed as a generic pointer
+ *
+ * Destroy notify callback for the GHashTable storing AFFileDestWriter instances.
+ **/
+static void
+affile_dd_destroy_writer_hash(gpointer value)
+{
+  GHashTable *writer_hash = (GHashTable *) value;
+  
+  g_hash_table_foreach_remove(writer_hash, affile_dd_destroy_writer_hr, NULL);
+  g_hash_table_destroy(writer_hash);
+}
+
 static gboolean
 affile_dd_deinit(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
 {
   AFFileDestDriver *self = (AFFileDestDriver *) s;
 
+  /* writer & writer_hash cannot be non-NULL at the same time */
   if (self->writer)
-    log_pipe_deinit(self->writer, cfg, NULL);
-  if (self->writer_hash)
-    g_hash_table_foreach_remove(self->writer_hash, affile_dd_remove_writers, self);
+    {
+      g_assert(self->writer_hash == NULL);
+
+      persist_config_add(persist, affile_dd_format_persist_name(self), self->writer, affile_dd_destroy_writer);
+      self->writer = NULL;
+    }
+  else if (self->writer_hash)
+    {
+      g_assert(self->writer == NULL);
+      
+      persist_config_add(persist, affile_dd_format_persist_name(self), self->writer_hash, affile_dd_destroy_writer_hash);
+      self->writer_hash = NULL;
+    }
   if (self->reap_timer)
     g_source_remove(self->reap_timer);
   self->cfg = NULL;
@@ -610,20 +704,20 @@ static void
 affile_dd_queue(LogPipe *s, LogMessage *msg, gint path_flags)
 {
   AFFileDestDriver *self = (AFFileDestDriver *) s;
-  LogPipe *next;
+  AFFileDestWriter *next;
 
   if (self->flags & AFFILE_NO_EXPAND)
     {
       if (!self->writer)
 	{
 	  next = affile_dw_new(self, g_string_new(self->filename_template->template->str));
-	  if (next && log_pipe_init(next, self->cfg, NULL))
+	  if (next && log_pipe_init(&next->super, self->cfg, NULL))
 	    {
 	      self->writer = next;
 	    }
 	  else
 	    {
-	      log_pipe_unref(next);
+	      log_pipe_unref(&next->super);
 	      next = NULL;
 	    }
 	}
@@ -651,9 +745,9 @@ affile_dd_queue(LogPipe *s, LogMessage *msg, gint path_flags)
       if (!next)
 	{
 	  next = affile_dw_new(self, filename);
-	  if (!log_pipe_init(next, self->cfg, NULL))
+	  if (!log_pipe_init(&next->super, self->cfg, NULL))
 	    {
-	      log_pipe_unref(next);
+	      log_pipe_unref(&next->super);
 	      next = NULL;
 	    }
 	  else
@@ -663,7 +757,7 @@ affile_dd_queue(LogPipe *s, LogMessage *msg, gint path_flags)
         g_string_free(filename, TRUE);
     }
   if (next)
-    log_pipe_queue(next, msg, path_flags);
+    log_pipe_queue(&next->super, msg, path_flags);
   else
     log_msg_drop(msg, path_flags);
 }
@@ -673,10 +767,10 @@ affile_dd_free(LogPipe *s)
 {
   AFFileDestDriver *self = (AFFileDestDriver *) s;
   
+  /* NOTE: this must be NULL as deinit has freed it, otherwise we'd have circular references */
+  g_assert(self->writer == NULL && self->writer_hash == NULL);
+  
   log_template_unref(self->filename_template);
-  log_pipe_unref(self->writer);
-  if (self->writer_hash)
-    g_hash_table_destroy(self->writer_hash);
   log_writer_options_destroy(&self->writer_options);
   log_drv_free_instance(&self->super);
   g_free(self);
diff --git a/src/affile.h b/src/affile.h
index 9268c6e..0f08e35 100644
--- a/src/affile.h
+++ b/src/affile.h
@@ -44,11 +44,13 @@ typedef struct _AFFileSourceDriver
 
 LogDriver *affile_sd_new(gchar *filename, guint32 flags);
 
+typedef struct _AFFileDestWriter AFFileDestWriter;
+
 typedef struct _AFFileDestDriver
 {
   LogDriver super;
   LogTemplate *filename_template;
-  LogPipe *writer;
+  AFFileDestWriter *writer;
   guint32 flags;
   uid_t file_uid;
   gid_t file_gid;
diff --git a/src/logwriter.c b/src/logwriter.c
index 955c333..3286da7 100644
--- a/src/logwriter.c
+++ b/src/logwriter.c
@@ -439,6 +439,13 @@ log_writer_reopen(LogPipe *s, FDWrite *newfd)
   return TRUE;
 }
 
+void
+log_writer_set_options(LogWriter *self, LogWriterOptions *options)
+{
+  self->options = options;
+}
+
+
 LogPipe *
 log_writer_new(guint32 flags, LogPipe *control, LogWriterOptions *options)
 {
diff --git a/src/logwriter.h b/src/logwriter.h
index 3bcb58a..d959dbf 100644
--- a/src/logwriter.h
+++ b/src/logwriter.h
@@ -80,6 +80,7 @@ typedef struct _LogWriter
   LogWriterOptions *options;
 } LogWriter;
 
+void log_writer_set_options(LogWriter *self, LogWriterOptions *options);
 void log_writer_format_log(LogWriter *self, LogMessage *lm, GString *result);
 gboolean log_writer_reopen(LogPipe *s, FDWrite *fd);
 LogPipe *log_writer_new(guint32 flags, LogPipe *control, LogWriterOptions *options);


-- 
Bazsi



More information about the syslog-ng mailing list