Hi, It seems that sometimes when syslog-ng receives a SIGHUP (like for example after logrotating or config change) it loses track of the logs that has been received, but are not yet written to disk. Except for this problem, syslog-ng can receive ~35000 logs/sec on the same machine without any loss at all. I have been able to reproduce the situation in a test environment on both FreeBSD 6.1 (Opteron) with syslog-ng 2.0.3 and on Ubuntu Linux 7.10 (i386) with syslog-ng 2.0.5. On linux, it appears that the number of logs lost equals the size of the log_fetch_limit parameter, because the number of lost logs are in multiples of this value. To see what happens I used strace to follow syslog-ng and dump read and write calls while sending numbered logs at a rate of 1000/sec. Only one machine was sending logs and they were transferred from a different machine over tcp. During this time I sent a SIGHUP to the receiving syslog with two seconds apart (since it doesn't occur with every SIGHUP). log_fetch_limit is set to 100, log_fifo_size to 51200 and log_msg_size is 2048. Flow-control is not activated since it didn't appear to make any difference in this particular case. This is the output of 'strace -s 65536 -e write,read' with the irrelevant parts omitted (I captured a similar output on FreeBSD with ktrace, but is not included in this mail) ## read from tcp socket (only beginning and end to save space) read(5, "<135>Nov 27 18:36:34 tiger logger: hejsan 9388\n<135>Nov 27 18:36:34 tiger logger: hejsan 9389\n<135>Nov 27 18:36:34 tiger logger: hejsan 9390\n<135>Nov 27 18:36:34 tiger logger: hejsan 9391\n<135>Nov 27 18:36:34 tiger logger: hejsan 9392\n<135>Nov 27 18:36:34 tiger logger: hejsan 9393\n<135>Nov 27 18:36:34 tiger logger: hejsan 9394\n<135>Nov 27 18:36:34 tiger logger: hejsan 9395\n<135>Nov 27 18:36:34 tiger logger: hejsan 9396\n<135>Nov 27 18:36:34 tiger logger: hejsan 9397\n<135>Nov 27 18:36:34 tiger logger: hejsan 9398\n<135>Nov 27 18:36:34 tiger logger: hejsan 9399\n<135>Nov 27 18:36:34 tiger logger: hejsan 9400\n<135>Nov 27 18:36:34 tiger logger: hejsan 9401\n<135>Nov 27 18:36:34 tiger logger: hejsan 9402\n<135>Nov 27 18:36:34 tiger logger: hejsan 9403\n<135>Nov 27 18:36:34 tiger logger: hejsan 9404\n<135>Nov 27 18:36:34 tiger logger: hejsan 9405\n<135>Nov 27 18:36:34 tiger logger: hejsan 9406\n<135>Nov 27 18:36:34 tiger logger: hejsan 9407\n<135>Nov 27 18:36:34 tiger logger: hejsan 9408\n<135>Nov 27 18:36:34 tiger logger: hejsan 9409\n<135>Nov 27 18:36:34 tiger logger: hejsan 9410\n . . . <135>Nov 27 18:36:34 tiger logger: hejsan 9504\n<135>Nov 27 18:36:34 tiger logger: hejsan 9505\n<135>Nov 27 18:36:3 4 tiger logger: hejsan 9506\n<135>Nov 27 18:36:34 tiger logger: hejsan 9507\n<135>Nov 27 18:36:34 tiger logger: hej ", 8192) = 5678 ## then follows all the writes until a SIGHUP is received ... write(6, "Nov 27 18:36:34 tiger logger: hejsan 9401\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9402\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9403\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9404\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9405\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9406\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9407\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9408\n", 42) = 42 --- SIGHUP (Hangup) @ 0 (0) --- ## After the config file is re-read, syslog-ng continues to read new logs from the socket, with unwritten logs from 9409 to 9507 in the buffer (9508 is partly read before the SIGHUP) read(5, "san 9508\n<135>Nov 27 18:36:34 tiger logger: hejsan 9509\n<135>Nov 27 18:36:34 tiger logger: hejsan 9510\n<135>Nov 27 18:36:34 tiger logger: hejsan 9511\n<135>Nov 27 18:36:34 tiger logger: hejsan 9512\n<135>Nov 27 18:36:34 tiger logger: hejsan 9513\n<135>Nov 27 18:36:34 tiger logger: hejsan 9514\n<135>Nov 27 18:36:34 tiger logger: hejsan 9515\n.... ## And finally new writes are done, but starting at 9508. The logs ranging from 9409 to 9507 are now lost and are consequently missing in the destination file. write(4, "Nov 27 18:36:34 tiger logger: hejsan 9508\n", 42) = 42 write(4, "Nov 27 18:36:34 tiger logger: hejsan 9509\n", 42) = 42 write(4, "Nov 27 18:36:34 tiger logger: hejsan 9510\n", 42) = 42 write(4, "Nov 27 18:36:34 tiger logger: hejsan 9511\n", 42) = 42 ----------------------------------------------------------------------------------------- I have experimented with different parameter settings, but am still losing logs at SIGHUP. I hope this information is enough if someone is able to diagnose the issue. If not, I can send my particular config etc on request. Any input on this is appreciated, has anyone else noticed this behaviour or is it maybe a configuration issue with this setup? // Martin
On Fri, 2007-11-30 at 14:34 +0100, Martin Ervius wrote:
Hi,
It seems that sometimes when syslog-ng receives a SIGHUP (like for example after logrotating or config change) it loses track of the logs that has been received, but are not yet written to disk. Except for this problem, syslog-ng can receive ~35000 logs/sec on the same machine without any loss at all.
I have been able to reproduce the situation in a test environment on both FreeBSD 6.1 (Opteron) with syslog-ng 2.0.3 and on Ubuntu Linux 7.10 (i386) with syslog-ng 2.0.5. On linux, it appears that the number of logs lost equals the size of the log_fetch_limit parameter, because the number of lost logs are in multiples of this value.
To see what happens I used strace to follow syslog-ng and dump read and write calls while sending numbered logs at a rate of 1000/sec. Only one machine was sending logs and they were transferred from a different machine over tcp. During this time I sent a SIGHUP to the receiving syslog with two seconds apart (since it doesn't occur with every SIGHUP). log_fetch_limit is set to 100, log_fifo_size to 51200 and log_msg_size is 2048. Flow-control is not activated since it didn't appear to make any difference in this particular case.
This is the output of 'strace -s 65536 -e write,read' with the irrelevant parts omitted (I captured a similar output on FreeBSD with ktrace, but is not included in this mail)
## read from tcp socket (only beginning and end to save space) read(5, "<135>Nov 27 18:36:34 tiger logger: hejsan 9388\n<135>Nov 27 18:36:34 tiger logger: hejsan 9389\n<135>Nov 27 18:36:34 tiger logger: hejsan 9390\n<135>Nov 27 18:36:34 tiger logger: hejsan 9391\n<135>Nov 27 18:36:34 tiger logger: hejsan 9392\n<135>Nov 27 18:36:34 tiger logger: hejsan 9393\n<135>Nov 27 18:36:34 tiger logger: hejsan 9394 \n<135>Nov 27 18:36:34 tiger logger: hejsan 9395\n<135>Nov 27 18:36:34 tiger logger: hejsan 9396\n<135>Nov 27 18:36:34 tiger logger: hejsan 9397\n<135>Nov 27 18:36:34 tiger logger: hejsan 9398\n<135>Nov 27 18:36:34 tiger logger: hejsan 9399\n<135>Nov 27 18:36:34 tiger logger: hejsan 9400\n<135>Nov 27 18:36:34 tiger logger: hejsan 9401\n<135>Nov 27 18:36:34 tiger logger: hejsan 9402\n<135>Nov 27 18:36:34 tiger logger: hejsan 9403\n<135>Nov 27 18:36:34 tiger logger: hejsan 9404 \n<135>Nov 27 18:36:34 tiger logger: hejsan 9405\n<135>Nov 27 18:36:34 tiger logger: hejsan 9406\n<135>Nov 27 18:36:34 tiger logger: hejsan 9407\n<135>Nov 27 18:36:34 tiger logger: hejsan 9408\n<135>Nov 27 18:36:34 tiger logger: hejsan 9409\n<135>Nov 27 18:36:34 tiger logger: hejsan 9410\n . . . <135>Nov 27 18:36:34 tiger logger: hejsan 9504\n<135>Nov 27 18:36:34 tiger logger: hejsan 9505\n<135>Nov 27 18:36:3 4 tiger logger: hejsan 9506\n<135>Nov 27 18:36:34 tiger logger: hejsan 9507\n<135>Nov 27 18:36:34 tiger logger: hej ", 8192) = 5678
## then follows all the writes until a SIGHUP is received ... write(6, "Nov 27 18:36:34 tiger logger: hejsan 9401\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9402\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9403\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9404\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9405\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9406\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9407\n", 42) = 42 write(6, "Nov 27 18:36:34 tiger logger: hejsan 9408\n", 42) = 42 --- SIGHUP (Hangup) @ 0 (0) ---
## After the config file is re-read, syslog-ng continues to read new logs from the socket, with unwritten logs from 9409 to 9507 in the buffer (9508 is partly read before the SIGHUP) read(5, "san 9508\n<135>Nov 27 18:36:34 tiger logger: hejsan 9509 \n<135>Nov 27 18:36:34 tiger logger: hejsan 9510\n<135>Nov 27 18:36:34 tiger logger: hejsan 9511\n<135>Nov 27 18:36:34 tiger logger: hejsan 9512\n<135>Nov 27 18:36:34 tiger logger: hejsan 9513\n<135>Nov 27 18:36:34 tiger logger: hejsan 9514\n<135>Nov 27 18:36:34 tiger logger: hejsan 9515\n....
## And finally new writes are done, but starting at 9508. The logs ranging from 9409 to 9507 are now lost and are consequently missing in the destination file. write(4, "Nov 27 18:36:34 tiger logger: hejsan 9508\n", 42) = 42 write(4, "Nov 27 18:36:34 tiger logger: hejsan 9509\n", 42) = 42 write(4, "Nov 27 18:36:34 tiger logger: hejsan 9510\n", 42) = 42 write(4, "Nov 27 18:36:34 tiger logger: hejsan 9511\n", 42) = 42
-----------------------------------------------------------------------------------------
I have experimented with different parameter settings, but am still losing logs at SIGHUP. I hope this information is enough if someone is able to diagnose the issue. If not, I can send my particular config etc on request. Any input on this is appreciated, has anyone else noticed this behaviour or is it maybe a configuration issue with this setup?
Thanks for the detailed explanation. I indeed forgot to save and restore some data buffers accross reloads. The patch is risky somewhat, so I did not apply it to the mainline yet. I performed some limited testing on my development laptop, I've made syslog-ng delay output on destination files, sent a couple of messages to it, sent a SIGHUP, some more messages, and it seems to write all buffered messages to the configured destination nicely. I've also checked for two cases of possible memory leaks, and syslog-ng freed what it had to free. I'd appreciate some additional testing. If there's an ACK from you, I apply the patch to mainline, where it can sit till the next release. The patch is attached to this mail, but I've also pushed it to git.balabit.hu, "remember-dw-file" branch. diff --git a/src/affile.c b/src/affile.c index b79074e..d1a9ab7 100644 --- a/src/affile.c +++ b/src/affile.c @@ -163,8 +163,6 @@ affile_sd_notify(LogPipe *s, LogPipe *sender, gint notify_code, gpointer user_da } } - - static gboolean affile_sd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist) { @@ -268,7 +266,7 @@ affile_sd_new(gchar *filename, guint32 flags) } -typedef struct _AFFileDestWriter +struct _AFFileDestWriter { LogPipe super; AFFileDestDriver *owner; @@ -277,7 +275,13 @@ typedef struct _AFFileDestWriter time_t last_msg_stamp; time_t last_open_stamp; time_t time_reopen; -} AFFileDestWriter; +}; + +static gboolean +affile_dw_reapable(AFFileDestWriter *self) +{ + return ((LogWriter *) self->writer)->queue->length == 0; +} static gboolean affile_dw_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist) @@ -372,6 +376,18 @@ affile_dw_queue(LogPipe *s, LogMessage *lm, gint path_flags) } static void +affile_dw_set_owner(AFFileDestWriter *self, AFFileDestDriver *owner) +{ + if (self->owner) + log_pipe_unref(&self->owner->super.super); + log_pipe_ref(&owner->super.super); + self->owner = owner; + if (self->writer) + log_writer_set_options((LogWriter *) self->writer, &owner->writer_options); + +} + +static void affile_dw_free(LogPipe *s) { AFFileDestWriter *self = (AFFileDestWriter *) s; @@ -382,7 +398,7 @@ affile_dw_free(LogPipe *s) g_free(s); } -static LogPipe * +static AFFileDestWriter * affile_dw_new(AFFileDestDriver *owner, GString *filename) { AFFileDestWriter *self = g_new0(AFFileDestWriter, 1); @@ -400,7 +416,7 @@ affile_dw_new(AFFileDestDriver *owner, GString *filename) /* we have to take care about freeing filename later. This avoids a move of the filename. */ self->filename = filename; - return &self->super; + return self; } void @@ -494,6 +510,16 @@ affile_dd_set_overwrite_if_older(LogDriver *s, gint overwrite_if_older) self->overwrite_if_older = overwrite_if_older; } +static inline gchar * +affile_dd_format_persist_name(AFFileDestDriver *self) +{ + static gchar persist_name[1024]; + + g_snprintf(persist_name, sizeof(persist_name), "affile_dd_writers(%s)", self->filename_template->template->str); + return persist_name; +} + + static const gchar * affile_dd_format_stats_name(AFFileDestDriver *self, gboolean is_pipe) { @@ -513,7 +539,7 @@ affile_dd_reap_writers(gpointer key, gpointer value, gpointer user_data) AFFileDestDriver *self = (AFFileDestDriver *) user_data; AFFileDestWriter *dw = (AFFileDestWriter *) value; - if ((reap_now - dw->last_msg_stamp) >= self->time_reap) + if ((reap_now - dw->last_msg_stamp) >= self->time_reap && affile_dw_reapable(dw)) { msg_verbose("Destination timed out, reaping", evt_tag_str("template", self->filename_template->template->str), @@ -542,6 +568,25 @@ affile_dd_reap(gpointer s) return TRUE; } +/** + * affile_dd_reuse_writer: + * + * This function is called as a g_hash_table_foreach() callback to set the + * owner of each writer, previously connected to an AFileDestDriver instance + * in an earlier configuration. This way AFFileDestWriter instances are + * remembered accross reloads. + * + **/ +static void +affile_dd_reuse_writer(gpointer key, gpointer value, gpointer user_data) +{ + AFFileDestDriver *self = (AFFileDestDriver *) user_data; + AFFileDestWriter *writer = (AFFileDestWriter *) value; + + affile_dw_set_owner(writer, self); +} + + static gboolean affile_dd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist) { @@ -576,30 +621,79 @@ affile_dd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist) if ((self->flags & AFFILE_NO_EXPAND) == 0) { self->reap_timer = g_timeout_add_full(G_PRIORITY_LOW, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL); + self->writer_hash = persist_config_fetch(persist, affile_dd_format_persist_name(self)); + if (self->writer_hash) + g_hash_table_foreach(self->writer_hash, affile_dd_reuse_writer, self); + } + else + { + self->writer = persist_config_fetch(persist, affile_dd_format_persist_name(self)); + affile_dw_set_owner(self->writer, self); } + + return TRUE; } + +/** + * This is registered as a destroy-notify callback for an AFFileDestWriter + * instance. It destructs and frees the writer instance. + **/ +static void +affile_dd_destroy_writer(gpointer value) +{ + AFFileDestWriter *writer = (AFFileDestWriter *) value; + log_pipe_deinit(&writer->super, NULL, NULL); + log_pipe_unref(&writer->super); +} + +/* + * This function is called as a g_hash_table_foreach_remove() callback to + * free the specific AFFileDestWriter instance in the hashtable. + */ static gboolean -affile_dd_remove_writers(gpointer key, gpointer value, gpointer user_data) +affile_dd_destroy_writer_hr(gpointer key, gpointer value, gpointer user_data) { - AFFileDestDriver *self = (AFFileDestDriver *) user_data; - LogPipe *writer = (LogPipe *) value; - - log_pipe_deinit(writer, self->cfg, NULL); - log_pipe_unref(writer); + affile_dd_destroy_writer(value); return TRUE; } +/** + * affile_dd_destroy_writer_hash: + * @value: GHashTable instance passed as a generic pointer + * + * Destroy notify callback for the GHashTable storing AFFileDestWriter instances. + **/ +static void +affile_dd_destroy_writer_hash(gpointer value) +{ + GHashTable *writer_hash = (GHashTable *) value; + + g_hash_table_foreach_remove(writer_hash, affile_dd_destroy_writer_hr, NULL); + g_hash_table_destroy(writer_hash); +} + static gboolean affile_dd_deinit(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist) { AFFileDestDriver *self = (AFFileDestDriver *) s; + /* writer & writer_hash cannot be non-NULL at the same time */ if (self->writer) - log_pipe_deinit(self->writer, cfg, NULL); - if (self->writer_hash) - g_hash_table_foreach_remove(self->writer_hash, affile_dd_remove_writers, self); + { + g_assert(self->writer_hash == NULL); + + persist_config_add(persist, affile_dd_format_persist_name(self), self->writer, affile_dd_destroy_writer); + self->writer = NULL; + } + else if (self->writer_hash) + { + g_assert(self->writer == NULL); + + persist_config_add(persist, affile_dd_format_persist_name(self), self->writer_hash, affile_dd_destroy_writer_hash); + self->writer_hash = NULL; + } if (self->reap_timer) g_source_remove(self->reap_timer); self->cfg = NULL; @@ -610,20 +704,20 @@ static void affile_dd_queue(LogPipe *s, LogMessage *msg, gint path_flags) { AFFileDestDriver *self = (AFFileDestDriver *) s; - LogPipe *next; + AFFileDestWriter *next; if (self->flags & AFFILE_NO_EXPAND) { if (!self->writer) { next = affile_dw_new(self, g_string_new(self->filename_template->template->str)); - if (next && log_pipe_init(next, self->cfg, NULL)) + if (next && log_pipe_init(&next->super, self->cfg, NULL)) { self->writer = next; } else { - log_pipe_unref(next); + log_pipe_unref(&next->super); next = NULL; } } @@ -651,9 +745,9 @@ affile_dd_queue(LogPipe *s, LogMessage *msg, gint path_flags) if (!next) { next = affile_dw_new(self, filename); - if (!log_pipe_init(next, self->cfg, NULL)) + if (!log_pipe_init(&next->super, self->cfg, NULL)) { - log_pipe_unref(next); + log_pipe_unref(&next->super); next = NULL; } else @@ -663,7 +757,7 @@ affile_dd_queue(LogPipe *s, LogMessage *msg, gint path_flags) g_string_free(filename, TRUE); } if (next) - log_pipe_queue(next, msg, path_flags); + log_pipe_queue(&next->super, msg, path_flags); else log_msg_drop(msg, path_flags); } @@ -673,10 +767,10 @@ affile_dd_free(LogPipe *s) { AFFileDestDriver *self = (AFFileDestDriver *) s; + /* NOTE: this must be NULL as deinit has freed it, otherwise we'd have circular references */ + g_assert(self->writer == NULL && self->writer_hash == NULL); + log_template_unref(self->filename_template); - log_pipe_unref(self->writer); - if (self->writer_hash) - g_hash_table_destroy(self->writer_hash); log_writer_options_destroy(&self->writer_options); log_drv_free_instance(&self->super); g_free(self); diff --git a/src/affile.h b/src/affile.h index 9268c6e..0f08e35 100644 --- a/src/affile.h +++ b/src/affile.h @@ -44,11 +44,13 @@ typedef struct _AFFileSourceDriver LogDriver *affile_sd_new(gchar *filename, guint32 flags); +typedef struct _AFFileDestWriter AFFileDestWriter; + typedef struct _AFFileDestDriver { LogDriver super; LogTemplate *filename_template; - LogPipe *writer; + AFFileDestWriter *writer; guint32 flags; uid_t file_uid; gid_t file_gid; diff --git a/src/logwriter.c b/src/logwriter.c index 955c333..3286da7 100644 --- a/src/logwriter.c +++ b/src/logwriter.c @@ -439,6 +439,13 @@ log_writer_reopen(LogPipe *s, FDWrite *newfd) return TRUE; } +void +log_writer_set_options(LogWriter *self, LogWriterOptions *options) +{ + self->options = options; +} + + LogPipe * log_writer_new(guint32 flags, LogPipe *control, LogWriterOptions *options) { diff --git a/src/logwriter.h b/src/logwriter.h index 3bcb58a..d959dbf 100644 --- a/src/logwriter.h +++ b/src/logwriter.h @@ -80,6 +80,7 @@ typedef struct _LogWriter LogWriterOptions *options; } LogWriter; +void log_writer_set_options(LogWriter *self, LogWriterOptions *options); void log_writer_format_log(LogWriter *self, LogMessage *lm, GString *result); gboolean log_writer_reopen(LogPipe *s, FDWrite *fd); LogPipe *log_writer_new(guint32 flags, LogPipe *control, LogWriterOptions *options); -- Bazsi
I downloaded a snapshot from: http://git.balabit.hu/?p=bazsi/syslog-ng-2.0.git;a=snapshot;h=a33ee30c1e059c.... It compiled ok, but gets a segmentation fault when initializing (it can display version and help info), but if I run with my cfg file it dies. Here's the last part of strace when i run it: open("/opt/syslog-ng-2.0.git/var/syslog-ng.persist", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 fcntl64(3, F_GETFD) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(514), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 listen(3, 255) = 0 setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [0], 4) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- kill(31071, SIGSEGV) = 0 sigreturn() = ? (mask now []) --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ Process 31071 detached The new 2.0.6 release runs fine on the same machine, but I may have done something wrong when I built the snapshot? To compile, I manually copied config.h.in, configure and install-sh from the 2.0.6 release to the base of the snapshot tree. After that, I ran: $ aclocal $ automake --add-missing $ ./configure --prefix=/opt/syslog-ng-2.0.git && make install // Martin ---------------------------------------------------------------------------------------------------------------------------- On Dec 1, 2007 5:15 PM, Balazs Scheidler <bazsi@balabit.hu> wrote: Thanks for the detailed explanation. I indeed forgot to save and restore some data buffers accross reloads. The patch is risky somewhat, so I did not apply it to the mainline yet. I performed some limited testing on my development laptop, I've made syslog-ng delay output on destination files, sent a couple of messages to it, sent a SIGHUP, some more messages, and it seems to write all buffered messages to the configured destination nicely. I've also checked for two cases of possible memory leaks, and syslog-ng freed what it had to free. I'd appreciate some additional testing. If there's an ACK from you, I apply the patch to mainline, where it can sit till the next release. The patch is attached to this mail, but I've also pushed it to git.balabit.hu, "remember-dw-file" branch.
On Mon, 2007-12-03 at 17:08 +0100, Martin Ervius wrote:
I downloaded a snapshot from: http://git.balabit.hu/?p=bazsi/syslog-ng-2.0.git;a=snapshot;h=a33ee30c1e059c.... It compiled ok, but gets a segmentation fault when initializing (it can display version and help info), but if I run with my cfg file it dies.
Here's the last part of strace when i run it:
open("/opt/syslog- ng-2.0.git/var/syslog-ng.persist", O_RDONLY| O_LARGEFILE) = -1 ENOENT (No such file or directory) socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 fcntl64(3, F_GETFD) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(514), sin_addr=inet_addr(" 0.0.0.0")}, 16) = 0 listen(3, 255) = 0 setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [0], 4) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- kill(31071, SIGSEGV) = 0 sigreturn() = ? (mask now []) --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ Process 31071 detached
Hmm... I'd appreciatea backtrace using gdb: gdb /opt/syslog-ng-2.0.git/sbin/syslog-ng -c <core file> (gdb) bt full all assuming that you have debug information. -- Bazsi
Here is the backtrace, (running under ubuntu 7.10 i386): root@krait[0]syslog-ng-2.0.git# /opt/syslog-ng-2.0.git/sbin/syslog-ng Segmentation fault (core dumped) root@krait[139]syslog-ng-2.0.git# gdb /opt/syslog-ng-2.0.git/sbin/syslog-ng -c core GNU gdb 6.6-debian Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-linux-gnu"... Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". warning: Can't read pathname for load map: Input/output error. Reading symbols from /lib/tls/i686/cmov/libnsl.so.1...done. Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1 Reading symbols from /lib/tls/i686/cmov/librt.so.1...done. Loaded symbols for /lib/tls/i686/cmov/librt.so.1 Reading symbols from /lib/tls/i686/cmov/libc.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libc.so.6 Reading symbols from /lib/tls/i686/cmov/libpthread.so.0...done. Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_compat.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_compat.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_nis.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_nis.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2 Core was generated by `/opt/syslog-ng-2.0.git/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. #0 0x08057bd5 in affile_dw_set_owner (self=0x0, owner=0x8098858) at affile.c:381 381 if (self->owner) (gdb) bt full #0 0x08057bd5 in affile_dw_set_owner (self=0x0, owner=0x8098858) at affile.c:381 No locals. #1 0x08058345 in affile_dd_init (s=0x8098858, cfg=0x8091048, persist=0x8098980) at affile.c:631 self = (AFFileDestDriver *) 0x8098858 #2 0x08054a8d in log_dest_group_init (s=0x8098938, cfg=0x8091048, persist=0x8098980) at dgroup.c:50 self = (LogDestGroup *) 0x8098938 p = (LogDriver *) 0x8098858 #3 0x080550b1 in log_center_init_component (key=0x8098968 "df_syslog-ng", value=0x8098938, self=0x8093000) at center.c:177 No locals. #4 0x08061152 in g_hash_table_foreach () No symbol table info available. #5 0x08055455 in log_center_init (s=0x8093000, cfg=0x8091048, persist=0x8098980) at center.c:262 self = (LogCenter *) 0x8093000 i = 2 j = 1 #6 0x0804c9f9 in cfg_init (cfg=0x8091048, persist=0x8098980) at cfg.c:212 regerr = 438 #7 0x0804b5a2 in main (argc=1, argv=0xbfdb0eb4) at main.c:411 cfg = (GlobalConfig *) 0x8091048 syslog_ng_options = {{name = 0x808198b "cfgfile", has_arg = 1, flag = 0x0, val = 102}, { name = 0x8081993 "pidfile", has_arg = 1, flag = 0x0, val = 112}, {name = 0x808199b "debug", has_arg = 0, flag = 0x0, val = 100}, {name = 0x80819a1 "syntax-only", has_arg = 0, flag = 0x0, val = 115}, { name = 0x80819ad "verbose", has_arg = 0, flag = 0x0, val = 118}, {name = 0x80819b5 "foreground", has_arg = 0, flag = 0x0, val = 70}, {name = 0x80819c0 "help", has_arg = 0, flag = 0x0, val = 104}, { name = 0x808189b "version", has_arg = 0, flag = 0x0, val = 86}, {name = 0x80819c5 "chroot", has_arg = 1, flag = 0x0, val = 67}, {name = 0x80819cc "user", has_arg = 1, flag = 0x0, val = 117}, { name = 0x80819d1 "group", has_arg = 1, flag = 0x0, val = 103}, {name = 0x80819d7 "stderr", has_arg = 0, flag = 0x0, val = 101}, {name = 0x80819de "yydebug", has_arg = 0, flag = 0x0, val = 121}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} syntax_only = 0 log_to_stderr = 0 persist = (PersistentConfig *) 0x8098980 opt = -1 rc = -1208502272 (gdb) quit // Martin On Dec 3, 2007 6:12 PM, Balazs Scheidler <bazsi@balabit.hu> wrote:
On Mon, 2007-12-03 at 17:08 +0100, Martin Ervius wrote:
I downloaded a snapshot from:
http://git.balabit.hu/?p=bazsi/syslog-ng-2.0.git;a=snapshot;h=a33ee30c1e059c.... It compiled ok, but gets a segmentation fault when initializing (it can display version and help info), but if I run with my cfg file it dies.
Here's the last part of strace when i run it:
open("/opt/syslog- ng-2.0.git/var/syslog-ng.persist", O_RDONLY| O_LARGEFILE) = -1 ENOENT (No such file or directory) socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 fcntl64(3, F_GETFD) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(514), sin_addr=inet_addr(" 0.0.0.0")}, 16) = 0 listen(3, 255) = 0 setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [0], 4) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- kill(31071, SIGSEGV) = 0 sigreturn() = ? (mask now []) --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ Process 31071 detached
Hmm... I'd appreciatea backtrace using gdb:
gdb /opt/syslog-ng-2.0.git/sbin/syslog-ng -c <core file> (gdb) bt full
all assuming that you have debug information.
-- Bazsi
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
On Mon, 2007-12-03 at 18:58 +0100, Martin Ervius wrote:
Here is the backtrace, (running under ubuntu 7.10 i386):
root@krait[0]syslog-ng-2.0.git# /opt/syslog-ng-2.0.git/sbin/syslog-ng Segmentation fault (core dumped) root@krait[139]syslog-ng-2.0.git# gdb /opt/syslog- ng-2.0.git/sbin/syslog-ng -c core GNU gdb 6.6-debian Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-linux-gnu"... Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
warning: Can't read pathname for load map: Input/output error. Reading symbols from /lib/tls/i686/cmov/libnsl.so.1...done. Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1 Reading symbols from /lib/tls/i686/cmov/librt.so.1...done. Loaded symbols for /lib/tls/i686/cmov/librt.so.1 Reading symbols from /lib/tls/i686/cmov/libc.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libc.so.6 Reading symbols from /lib/tls/i686/cmov/libpthread.so.0...done. Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_compat.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_compat.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_nis.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_nis.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2 Core was generated by `/opt/syslog-ng-2.0.git/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. #0 0x08057bd5 in affile_dw_set_owner (self=0x0, owner=0x8098858) at affile.c:381 381 if (self->owner) (gdb) bt full #0 0x08057bd5 in affile_dw_set_owner (self=0x0, owner=0x8098858) at affile.c:381 No locals.
Hmm... I encountered a similar error in my tree when I forgot to "make clean" before recompiling. Can you check if you do a make clean and then recompile the above problem goes away? I also have Ubuntu 7.10 installed and it works for me: bazsi@bzorp:~/zwa/work/syslog-ng-2.0/syslog-ng$ git-rev-list --max-count=1 HEAD a33ee30c1e059c1c6dc9839047379f3f3a8dfa4d bazsi@bzorp:~/.zwa/install/syslog-ng-2.0$ sbin/syslog-ng -Fedvf etc/syslog-ng.conf Connection failed; error='No such file or directory (2)' Initiating connection failed, reconnecting; time_reopen='5' syslog-ng starting up; version='2.0.6' Termination requested via signal, terminating; syslog-ng shutting down; version='2.0.6' Closing log reader fd; fd='7' Error saving persistent configuration file; name='/home/bazsi/zwa/install/syslog-ng-2.0/var/syslog-ng.persist' Closing log reader fd; fd='5' Closing log reader fd; fd='4' -- Bazsi
I had done a 'make clean' before, but I think I found the problem when looking through the config parameters. Instead of my config file, I used the example one and it didn't segfault. I narrowed it down to the 'destination' parameters which seem to need a variable like ($HOST or $FACILITY) in the destination file (havent tried this version with other destinations than files). In my config I had destination path names like '/var/log/tcp.log'. When I changed that to '/var/log/$HOST_tcp.log' it didn't die anymore. Can you reproduce this behaviour? Apparently It doesn't matter where a variable is present on each line, but it needs one on every destination. I proceeded with testing the SIGHUP problem and as far as I can see your patch works perfectly. I tested as I had done before and there was no loss of logs. I even stress tested a bit by sending 10 HUP's per second while syslog-ng received five million rows (@25000 rows/sec) from another server. Everything still worked fine :) Thanks for your help on this! // Martin On Dec 3, 2007 7:09 PM, Balazs Scheidler <bazsi@balabit.hu> wrote:
On Mon, 2007-12-03 at 18:58 +0100, Martin Ervius wrote:
Here is the backtrace, (running under ubuntu 7.10 i386):
root@krait[0]syslog-ng-2.0.git# /opt/syslog-ng-2.0.git/sbin/syslog-ng Segmentation fault (core dumped) root@krait[139]syslog-ng-2.0.git# gdb /opt/syslog- ng-2.0.git/sbin/syslog-ng -c core GNU gdb 6.6-debian Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-linux-gnu"... Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
warning: Can't read pathname for load map: Input/output error. Reading symbols from /lib/tls/i686/cmov/libnsl.so.1...done. Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1 Reading symbols from /lib/tls/i686/cmov/librt.so.1...done. Loaded symbols for /lib/tls/i686/cmov/librt.so.1 Reading symbols from /lib/tls/i686/cmov/libc.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libc.so.6 Reading symbols from /lib/tls/i686/cmov/libpthread.so.0...done. Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_compat.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_compat.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_nis.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_nis.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2 Core was generated by `/opt/syslog-ng-2.0.git/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. #0 0x08057bd5 in affile_dw_set_owner (self=0x0, owner=0x8098858) at affile.c:381 381 if (self->owner) (gdb) bt full #0 0x08057bd5 in affile_dw_set_owner (self=0x0, owner=0x8098858) at affile.c:381 No locals.
Hmm... I encountered a similar error in my tree when I forgot to "make clean" before recompiling. Can you check if you do a make clean and then recompile the above problem goes away?
I also have Ubuntu 7.10 installed and it works for me:
bazsi@bzorp:~/zwa/work/syslog-ng-2.0/syslog-ng$ git-rev-list --max-count=1 HEAD a33ee30c1e059c1c6dc9839047379f3f3a8dfa4d bazsi@bzorp:~/.zwa/install/syslog-ng-2.0$ sbin/syslog-ng -Fedvf etc/syslog-ng.conf Connection failed; error='No such file or directory (2)' Initiating connection failed, reconnecting; time_reopen='5' syslog-ng starting up; version='2.0.6' Termination requested via signal, terminating; syslog-ng shutting down; version='2.0.6' Closing log reader fd; fd='7' Error saving persistent configuration file; name='/home/bazsi/zwa/install/syslog-ng-2.0/var/syslog-ng.persist' Closing log reader fd; fd='5' Closing log reader fd; fd='4'
-- Bazsi
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
On Tue, 2007-12-04 at 19:32 +0100, Martin Ervius wrote:
I had done a 'make clean' before, but I think I found the problem when looking through the config parameters.
Instead of my config file, I used the example one and it didn't segfault. I narrowed it down to the 'destination' parameters which seem to need a variable like ($HOST or $FACILITY) in the destination file (havent tried this version with other destinations than files).
In my config I had destination path names like '/var/log/tcp.log'. When I changed that to '/var/log/$HOST_tcp.log' it didn't die anymore. Can you reproduce this behaviour? Apparently It doesn't matter where a variable is present on each line, but it needs one on every destination.
I proceeded with testing the SIGHUP problem and as far as I can see your patch works perfectly. I tested as I had done before and there was no loss of logs. I even stress tested a bit by sending 10 HUP's per second while syslog-ng received five million rows (@25000 rows/sec) from another server. Everything still worked fine :)
Thanks for narrowing this down. It was a possible NULL deref that I've fixed now. I've committed the patch to mainline, and I've also pushed it to our git repository. http://git.balabit.hu/?p=bazsi/syslog-ng-2.0.git;a=summary -- Bazsi
participants (2)
-
Balazs Scheidler
-
Martin Ervius