[syslog-ng] missing logs when SIGHUP
Balazs Scheidler
bazsi at balabit.hu
Sat Dec 1 17:15:04 CET 2007
On Fri, 2007-11-30 at 14:34 +0100, Martin Ervius wrote:
> Hi,
>
> It seems that sometimes when syslog-ng receives a SIGHUP (like for
> example after logrotating or config change) it loses track of the logs
> that has been received, but are not yet written to disk. Except for
> this problem, syslog-ng can receive ~35000 logs/sec on the same
> machine without any loss at all.
>
> I have been able to reproduce the situation in a test environment on
> both FreeBSD 6.1 (Opteron) with syslog-ng 2.0.3 and on Ubuntu Linux
> 7.10 (i386) with syslog-ng 2.0.5. On linux, it appears that the number
> of logs lost equals the size of the log_fetch_limit parameter, because
> the number of lost logs are in multiples of this value.
>
> To see what happens I used strace to follow syslog-ng and dump read
> and write calls while sending numbered logs at a rate of 1000/sec.
> Only one machine was sending logs and they were transferred from a
> different machine over tcp. During this time I sent a SIGHUP to the
> receiving syslog with two seconds apart (since it doesn't occur with
> every SIGHUP). log_fetch_limit is set to 100, log_fifo_size to 51200
> and log_msg_size is 2048. Flow-control is not activated since it
> didn't appear to make any difference in this particular case.
>
>
> This is the output of 'strace -s 65536 -e write,read' with the
> irrelevant parts omitted (I captured a similar output on FreeBSD with
> ktrace, but is not included in this mail)
>
> ## read from tcp socket (only beginning and end to save space)
> read(5, "<135>Nov 27 18:36:34 tiger logger: hejsan 9388\n<135>Nov 27
> 18:36:34 tiger logger: hejsan 9389\n<135>Nov 27 18:36:34 tiger logger:
> hejsan 9390\n<135>Nov 27 18:36:34 tiger logger: hejsan 9391\n<135>Nov
> 27 18:36:34 tiger logger: hejsan 9392\n<135>Nov 27 18:36:34 tiger
> logger: hejsan 9393\n<135>Nov 27 18:36:34 tiger logger: hejsan 9394
> \n<135>Nov 27 18:36:34 tiger logger: hejsan 9395\n<135>Nov 27 18:36:34
> tiger logger: hejsan 9396\n<135>Nov 27 18:36:34 tiger logger: hejsan
> 9397\n<135>Nov 27 18:36:34 tiger logger: hejsan 9398\n<135>Nov 27
> 18:36:34 tiger logger: hejsan 9399\n<135>Nov 27 18:36:34 tiger logger:
> hejsan 9400\n<135>Nov 27 18:36:34 tiger logger: hejsan 9401\n<135>Nov
> 27 18:36:34 tiger logger: hejsan 9402\n<135>Nov 27 18:36:34 tiger
> logger: hejsan 9403\n<135>Nov 27 18:36:34 tiger logger: hejsan 9404
> \n<135>Nov 27 18:36:34 tiger logger: hejsan 9405\n<135>Nov 27 18:36:34
> tiger logger: hejsan 9406\n<135>Nov 27 18:36:34 tiger logger: hejsan
> 9407\n<135>Nov 27 18:36:34 tiger logger: hejsan 9408\n<135>Nov 27
> 18:36:34 tiger logger: hejsan 9409\n<135>Nov 27 18:36:34 tiger logger:
> hejsan 9410\n
> . . .
> <135>Nov 27 18:36:34 tiger logger: hejsan 9504\n<135>Nov 27 18:36:34
> tiger logger: hejsan 9505\n<135>Nov 27 18:36:3
> 4 tiger logger: hejsan 9506\n<135>Nov 27 18:36:34 tiger logger: hejsan
> 9507\n<135>Nov 27 18:36:34 tiger logger: hej
> ", 8192) = 5678
>
> ## then follows all the writes until a SIGHUP is received
> ...
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9401\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9402\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9403\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9404\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9405\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9406\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9407\n", 42) = 42
> write(6, "Nov 27 18:36:34 tiger logger: hejsan 9408\n", 42) = 42
> --- SIGHUP (Hangup) @ 0 (0) ---
>
>
> ## After the config file is re-read, syslog-ng continues to read new
> logs from the socket, with unwritten logs from 9409 to 9507 in the
> buffer (9508 is partly read before the SIGHUP)
> read(5, "san 9508\n<135>Nov 27 18:36:34 tiger logger: hejsan 9509
> \n<135>Nov 27 18:36:34 tiger logger: hejsan 9510\n<135>Nov 27 18:36:34
> tiger logger: hejsan 9511\n<135>Nov 27 18:36:34 tiger logger: hejsan
> 9512\n<135>Nov 27 18:36:34 tiger logger: hejsan 9513\n<135>Nov 27
> 18:36:34 tiger logger: hejsan 9514\n<135>Nov 27 18:36:34 tiger logger:
> hejsan 9515\n....
>
>
> ## And finally new writes are done, but starting at 9508. The logs
> ranging from 9409 to 9507 are now lost and are consequently missing in
> the destination file.
> write(4, "Nov 27 18:36:34 tiger logger: hejsan 9508\n", 42) = 42
> write(4, "Nov 27 18:36:34 tiger logger: hejsan 9509\n", 42) = 42
> write(4, "Nov 27 18:36:34 tiger logger: hejsan 9510\n", 42) = 42
> write(4, "Nov 27 18:36:34 tiger logger: hejsan 9511\n", 42) = 42
>
> -----------------------------------------------------------------------------------------
>
>
> I have experimented with different parameter settings, but am still
> losing logs at SIGHUP. I hope this information is enough if someone is
> able to diagnose the issue. If not, I can send my particular config
> etc on request. Any input on this is appreciated, has anyone else
> noticed this behaviour or is it maybe a configuration issue with this
> setup?
Thanks for the detailed explanation. I indeed forgot to save and restore
some data buffers accross reloads. The patch is risky somewhat, so I did
not apply it to the mainline yet.
I performed some limited testing on my development laptop, I've made
syslog-ng delay output on destination files, sent a couple of messages
to it, sent a SIGHUP, some more messages, and it seems to write all
buffered messages to the configured destination nicely. I've also
checked for two cases of possible memory leaks, and syslog-ng freed what
it had to free.
I'd appreciate some additional testing. If there's an ACK from you, I
apply the patch to mainline, where it can sit till the next release.
The patch is attached to this mail, but I've also pushed it to
git.balabit.hu, "remember-dw-file" branch.
diff --git a/src/affile.c b/src/affile.c
index b79074e..d1a9ab7 100644
--- a/src/affile.c
+++ b/src/affile.c
@@ -163,8 +163,6 @@ affile_sd_notify(LogPipe *s, LogPipe *sender, gint notify_code, gpointer user_da
}
}
-
-
static gboolean
affile_sd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
{
@@ -268,7 +266,7 @@ affile_sd_new(gchar *filename, guint32 flags)
}
-typedef struct _AFFileDestWriter
+struct _AFFileDestWriter
{
LogPipe super;
AFFileDestDriver *owner;
@@ -277,7 +275,13 @@ typedef struct _AFFileDestWriter
time_t last_msg_stamp;
time_t last_open_stamp;
time_t time_reopen;
-} AFFileDestWriter;
+};
+
+static gboolean
+affile_dw_reapable(AFFileDestWriter *self)
+{
+ return ((LogWriter *) self->writer)->queue->length == 0;
+}
static gboolean
affile_dw_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
@@ -372,6 +376,18 @@ affile_dw_queue(LogPipe *s, LogMessage *lm, gint path_flags)
}
static void
+affile_dw_set_owner(AFFileDestWriter *self, AFFileDestDriver *owner)
+{
+ if (self->owner)
+ log_pipe_unref(&self->owner->super.super);
+ log_pipe_ref(&owner->super.super);
+ self->owner = owner;
+ if (self->writer)
+ log_writer_set_options((LogWriter *) self->writer, &owner->writer_options);
+
+}
+
+static void
affile_dw_free(LogPipe *s)
{
AFFileDestWriter *self = (AFFileDestWriter *) s;
@@ -382,7 +398,7 @@ affile_dw_free(LogPipe *s)
g_free(s);
}
-static LogPipe *
+static AFFileDestWriter *
affile_dw_new(AFFileDestDriver *owner, GString *filename)
{
AFFileDestWriter *self = g_new0(AFFileDestWriter, 1);
@@ -400,7 +416,7 @@ affile_dw_new(AFFileDestDriver *owner, GString *filename)
/* we have to take care about freeing filename later.
This avoids a move of the filename. */
self->filename = filename;
- return &self->super;
+ return self;
}
void
@@ -494,6 +510,16 @@ affile_dd_set_overwrite_if_older(LogDriver *s, gint overwrite_if_older)
self->overwrite_if_older = overwrite_if_older;
}
+static inline gchar *
+affile_dd_format_persist_name(AFFileDestDriver *self)
+{
+ static gchar persist_name[1024];
+
+ g_snprintf(persist_name, sizeof(persist_name), "affile_dd_writers(%s)", self->filename_template->template->str);
+ return persist_name;
+}
+
+
static const gchar *
affile_dd_format_stats_name(AFFileDestDriver *self, gboolean is_pipe)
{
@@ -513,7 +539,7 @@ affile_dd_reap_writers(gpointer key, gpointer value, gpointer user_data)
AFFileDestDriver *self = (AFFileDestDriver *) user_data;
AFFileDestWriter *dw = (AFFileDestWriter *) value;
- if ((reap_now - dw->last_msg_stamp) >= self->time_reap)
+ if ((reap_now - dw->last_msg_stamp) >= self->time_reap && affile_dw_reapable(dw))
{
msg_verbose("Destination timed out, reaping",
evt_tag_str("template", self->filename_template->template->str),
@@ -542,6 +568,25 @@ affile_dd_reap(gpointer s)
return TRUE;
}
+/**
+ * affile_dd_reuse_writer:
+ *
+ * This function is called as a g_hash_table_foreach() callback to set the
+ * owner of each writer, previously connected to an AFileDestDriver instance
+ * in an earlier configuration. This way AFFileDestWriter instances are
+ * remembered accross reloads.
+ *
+ **/
+static void
+affile_dd_reuse_writer(gpointer key, gpointer value, gpointer user_data)
+{
+ AFFileDestDriver *self = (AFFileDestDriver *) user_data;
+ AFFileDestWriter *writer = (AFFileDestWriter *) value;
+
+ affile_dw_set_owner(writer, self);
+}
+
+
static gboolean
affile_dd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
{
@@ -576,30 +621,79 @@ affile_dd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
if ((self->flags & AFFILE_NO_EXPAND) == 0)
{
self->reap_timer = g_timeout_add_full(G_PRIORITY_LOW, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL);
+ self->writer_hash = persist_config_fetch(persist, affile_dd_format_persist_name(self));
+ if (self->writer_hash)
+ g_hash_table_foreach(self->writer_hash, affile_dd_reuse_writer, self);
+ }
+ else
+ {
+ self->writer = persist_config_fetch(persist, affile_dd_format_persist_name(self));
+ affile_dw_set_owner(self->writer, self);
}
+
+
return TRUE;
}
+
+/**
+ * This is registered as a destroy-notify callback for an AFFileDestWriter
+ * instance. It destructs and frees the writer instance.
+ **/
+static void
+affile_dd_destroy_writer(gpointer value)
+{
+ AFFileDestWriter *writer = (AFFileDestWriter *) value;
+ log_pipe_deinit(&writer->super, NULL, NULL);
+ log_pipe_unref(&writer->super);
+}
+
+/*
+ * This function is called as a g_hash_table_foreach_remove() callback to
+ * free the specific AFFileDestWriter instance in the hashtable.
+ */
static gboolean
-affile_dd_remove_writers(gpointer key, gpointer value, gpointer user_data)
+affile_dd_destroy_writer_hr(gpointer key, gpointer value, gpointer user_data)
{
- AFFileDestDriver *self = (AFFileDestDriver *) user_data;
- LogPipe *writer = (LogPipe *) value;
-
- log_pipe_deinit(writer, self->cfg, NULL);
- log_pipe_unref(writer);
+ affile_dd_destroy_writer(value);
return TRUE;
}
+/**
+ * affile_dd_destroy_writer_hash:
+ * @value: GHashTable instance passed as a generic pointer
+ *
+ * Destroy notify callback for the GHashTable storing AFFileDestWriter instances.
+ **/
+static void
+affile_dd_destroy_writer_hash(gpointer value)
+{
+ GHashTable *writer_hash = (GHashTable *) value;
+
+ g_hash_table_foreach_remove(writer_hash, affile_dd_destroy_writer_hr, NULL);
+ g_hash_table_destroy(writer_hash);
+}
+
static gboolean
affile_dd_deinit(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
{
AFFileDestDriver *self = (AFFileDestDriver *) s;
+ /* writer & writer_hash cannot be non-NULL at the same time */
if (self->writer)
- log_pipe_deinit(self->writer, cfg, NULL);
- if (self->writer_hash)
- g_hash_table_foreach_remove(self->writer_hash, affile_dd_remove_writers, self);
+ {
+ g_assert(self->writer_hash == NULL);
+
+ persist_config_add(persist, affile_dd_format_persist_name(self), self->writer, affile_dd_destroy_writer);
+ self->writer = NULL;
+ }
+ else if (self->writer_hash)
+ {
+ g_assert(self->writer == NULL);
+
+ persist_config_add(persist, affile_dd_format_persist_name(self), self->writer_hash, affile_dd_destroy_writer_hash);
+ self->writer_hash = NULL;
+ }
if (self->reap_timer)
g_source_remove(self->reap_timer);
self->cfg = NULL;
@@ -610,20 +704,20 @@ static void
affile_dd_queue(LogPipe *s, LogMessage *msg, gint path_flags)
{
AFFileDestDriver *self = (AFFileDestDriver *) s;
- LogPipe *next;
+ AFFileDestWriter *next;
if (self->flags & AFFILE_NO_EXPAND)
{
if (!self->writer)
{
next = affile_dw_new(self, g_string_new(self->filename_template->template->str));
- if (next && log_pipe_init(next, self->cfg, NULL))
+ if (next && log_pipe_init(&next->super, self->cfg, NULL))
{
self->writer = next;
}
else
{
- log_pipe_unref(next);
+ log_pipe_unref(&next->super);
next = NULL;
}
}
@@ -651,9 +745,9 @@ affile_dd_queue(LogPipe *s, LogMessage *msg, gint path_flags)
if (!next)
{
next = affile_dw_new(self, filename);
- if (!log_pipe_init(next, self->cfg, NULL))
+ if (!log_pipe_init(&next->super, self->cfg, NULL))
{
- log_pipe_unref(next);
+ log_pipe_unref(&next->super);
next = NULL;
}
else
@@ -663,7 +757,7 @@ affile_dd_queue(LogPipe *s, LogMessage *msg, gint path_flags)
g_string_free(filename, TRUE);
}
if (next)
- log_pipe_queue(next, msg, path_flags);
+ log_pipe_queue(&next->super, msg, path_flags);
else
log_msg_drop(msg, path_flags);
}
@@ -673,10 +767,10 @@ affile_dd_free(LogPipe *s)
{
AFFileDestDriver *self = (AFFileDestDriver *) s;
+ /* NOTE: this must be NULL as deinit has freed it, otherwise we'd have circular references */
+ g_assert(self->writer == NULL && self->writer_hash == NULL);
+
log_template_unref(self->filename_template);
- log_pipe_unref(self->writer);
- if (self->writer_hash)
- g_hash_table_destroy(self->writer_hash);
log_writer_options_destroy(&self->writer_options);
log_drv_free_instance(&self->super);
g_free(self);
diff --git a/src/affile.h b/src/affile.h
index 9268c6e..0f08e35 100644
--- a/src/affile.h
+++ b/src/affile.h
@@ -44,11 +44,13 @@ typedef struct _AFFileSourceDriver
LogDriver *affile_sd_new(gchar *filename, guint32 flags);
+typedef struct _AFFileDestWriter AFFileDestWriter;
+
typedef struct _AFFileDestDriver
{
LogDriver super;
LogTemplate *filename_template;
- LogPipe *writer;
+ AFFileDestWriter *writer;
guint32 flags;
uid_t file_uid;
gid_t file_gid;
diff --git a/src/logwriter.c b/src/logwriter.c
index 955c333..3286da7 100644
--- a/src/logwriter.c
+++ b/src/logwriter.c
@@ -439,6 +439,13 @@ log_writer_reopen(LogPipe *s, FDWrite *newfd)
return TRUE;
}
+void
+log_writer_set_options(LogWriter *self, LogWriterOptions *options)
+{
+ self->options = options;
+}
+
+
LogPipe *
log_writer_new(guint32 flags, LogPipe *control, LogWriterOptions *options)
{
diff --git a/src/logwriter.h b/src/logwriter.h
index 3bcb58a..d959dbf 100644
--- a/src/logwriter.h
+++ b/src/logwriter.h
@@ -80,6 +80,7 @@ typedef struct _LogWriter
LogWriterOptions *options;
} LogWriter;
+void log_writer_set_options(LogWriter *self, LogWriterOptions *options);
void log_writer_format_log(LogWriter *self, LogMessage *lm, GString *result);
gboolean log_writer_reopen(LogPipe *s, FDWrite *fd);
LogPipe *log_writer_new(guint32 flags, LogPipe *control, LogWriterOptions *options);
--
Bazsi
More information about the syslog-ng
mailing list