Re: [syslog-ng] Strange behavior with the option "mark"
Thank you for your answer. When I start the syslog-ng with the option "mark" sets to 0, no MARKS messages are sent (as written in the OSE 3.1 Administrator Guide). This problem appears only when the option "mark" is set from a value <> 0 to a value = 0 and when I reload the configuration with a HUP signal. Regards, Yann I. On Tue, 2010-08-03 at 10:58 +0200, Ilas, Yann wrote:
Hello everybody,
I'm currently using the application syslog-ng version 3.1.1 and there is an odd behavior with the "mark" option.
Here is the configuration file used for the test :
@version: 3.0
options { mark(7); };
source s_local { internal(); unix-stream( "/dev/log" ); };
destination d_f_msg_unknown { file ( /var/log/messages__unknown ); };
log { source (s_local); destination (d_f_msg_unknown); };
...and I start the syslog-ng application like this : # cd /opt/syslog-ng/sbin/ # ./syslog-ng -Fevd --foreground --no-caps --cfgfile=/opt/syslog-ng/etc/syslog-ng.conf.MARK --pidfile=/tmp/syslog-ng.conf.pid
The output of the log file : Aug 2 17:44:22 serveur01 -- MARK -- Aug 2 17:44:29 serveur01 -- MARK -- Aug 2 17:44:36 serveur01 -- MARK -- (...)
Then, I change the configuration file from "mark(7)" to "mark(0)" and send a HUP signal between "17:44:36" and "17:44:43" # kill -HUP $(cat /tmp/syslog-ng.conf.pid)
Here is the output (tail -f /var/log/messages__unknown) : Aug 2 17:44:36 serveur01 -- MARK -- <HUP signal> Aug 2 17:44:43 serveur01 -- MARK -- Aug 2 17:44:50 serveur01 -- MARK -- Aug 2 17:44:50 serveur01 -- MARK -- Aug 2 17:44:50 serveur01 -- MARK -- Aug 2 17:44:50 serveur01 -- MARK -- Aug 2 17:44:50 serveur01 -- MARK -- Aug 2 17:44:50 serveur01 -- MARK -- Aug 2 17:44:50 serveur01 -- MARK -- (...) ...then the server syslog-ng sends a lot of "MARK" messages. I have to stop the process with a "kill" or "ctrl+c".
I have the same behaviour if I use "mark_freq" instead of "mark". What's wrong with that option ? Did I miss something ?
well, we should recognize mark(0) the same as mark(-1) effectively disabling the mark feature. right now, mark(0) means that there's zero time between two mark messages, effectively generating one mark message per poll iteration, this is what you see. Or, does this happen only if you do a SIGHUP? Or the same happens when you start syslog-ng? -- Bazsi
Am Dienstag, 3. August 2010 16:01:39 schrieb Ilas, Yann:
Thank you for your answer.
When I start the syslog-ng with the option "mark" sets to 0, no MARKS messages are sent (as written in the OSE 3.1 Administrator Guide). This problem appears only when the option "mark" is set from a value <> 0 to a value = 0 and when I reload the configuration with a HUP signal.
One of our customers run into this problem too, but it seems to be not easy to reproduce it clearly and it has been usually resolved as WORKSFORME. Your hint, that it is required to change from <>0 to ==0 + reload may help to reproduce it, thanks! See also: https://bugzilla.balabit.com/show_bug.cgi?id=10 http://bugs.gentoo.org/228601 https://bugzilla.novell.com/show_bug.cgi?id=617111 I think it happens when one function fetching time gets another second value than the another one (just a change in tv_usec) and this statement results in a 0 timeout: *timeout = MAX((next_mark_target - tv.tv_sec) * 1000, 0); Changing it to *timeout = MAX((next_mark_target - tv.tv_sec) * 1000, 250); limits the number of the messages that may happen inside of 1 second when the event runs with timeout 0, what may fill /var partition with mark messages. The attached patch makes more than this (maybe a little bit too much). It also disables the watch function completely when mark_freq==0, what saves a lot of CPU ticks and makes sure, that the time is not fetched twice in one function with 2 compares. Balazs, please review it. Thanks! Gruesse / Regards, Marius Tomaschewski <mt@suse.de> <mt@novell.com> -- Server Technologies Team, SUSE LINUX Products GmbH, Nuernberg; GF: Markus Rex; HRB 16746 (AG Nuernberg) GPG/PGP public key fingerprint: DF17 271A AD15 006A 5BB9 6C96 CA2F F3F7 373A 1CC0
Am Donnerstag, 5. August 2010 22:05:06 schrieb Marius Tomaschewski: [...] BTW: It happens also with 2.0.9 (+ 100% CPU patches from later versions). Gruesse / Regards, Marius Tomaschewski <mt@suse.de> <mt@novell.com> -- Server Technologies Team, SUSE LINUX Products GmbH, Nuernberg; GF: Markus Rex; HRB 16746 (AG Nuernberg) GPG/PGP public key fingerprint: DF17 271A AD15 006A 5BB9 6C96 CA2F F3F7 373A 1CC0
On Thu, 2010-08-05 at 22:05 +0200, Marius Tomaschewski wrote:
Am Dienstag, 3. August 2010 16:01:39 schrieb Ilas, Yann:
Thank you for your answer.
When I start the syslog-ng with the option "mark" sets to 0, no MARKS messages are sent (as written in the OSE 3.1 Administrator Guide). This problem appears only when the option "mark" is set from a value <> 0 to a value = 0 and when I reload the configuration with a HUP signal.
One of our customers run into this problem too, but it seems to be not easy to reproduce it clearly and it has been usually resolved as WORKSFORME.
Your hint, that it is required to change from <>0 to ==0 + reload may help to reproduce it, thanks!
See also: https://bugzilla.balabit.com/show_bug.cgi?id=10 http://bugs.gentoo.org/228601 https://bugzilla.novell.com/show_bug.cgi?id=617111
I think it happens when one function fetching time gets another second value than the another one (just a change in tv_usec) and this statement results in a 0 timeout:
*timeout = MAX((next_mark_target - tv.tv_sec) * 1000, 0);
Changing it to *timeout = MAX((next_mark_target - tv.tv_sec) * 1000, 250);
limits the number of the messages that may happen inside of 1 second when the event runs with timeout 0, what may fill /var partition with mark messages.
The attached patch makes more than this (maybe a little bit too much). It also disables the watch function completely when mark_freq==0, what saves a lot of CPU ticks and makes sure, that the time is not fetched twice in one function with 2 compares.
Balazs, please review it. Thanks!
This patch seems to solve the issue for me, I've pushed it to 3.2 OSE branch, but can trivially be backported to earlier branches. diff --git a/src/afinter.c b/src/afinter.c index d631c3d..31435f7 100644 --- a/src/afinter.c +++ b/src/afinter.c @@ -154,6 +154,7 @@ afinter_source_watch_new(LogSource *afinter_source, gint mark_freq) { AFInterWatch *self = (AFInterWatch *) g_source_new(&afinter_source_watch_funcs, sizeof(AFInterWatch)); + next_mark_target = -1; self->mark_freq = mark_freq; self->afinter_source = afinter_source; g_source_set_callback(&self->super, (GSourceFunc) afinter_source_dispatch_msg, log_pipe_ref(&afinter_source->super), (GDestroyNotify) log_pipe_unref);
--- syslog-ng-2.0.9/src/afinter.c +++ syslog-ng-2.0.9/src/afinter.c 2010/07/12 08:38:35 @@ -56,20 +56,17 @@ afinter_source_prepare(GSource *source,
*timeout = -1;
- if (self->mark_freq > 0 && next_mark_target == -1) + if (self->mark_freq > 0) { g_source_get_current_time(source, &tv); - next_mark_target = tv.tv_sec + self->mark_freq; - } - - if (next_mark_target != -1) - { - g_source_get_current_time(source, &tv); - *timeout = MAX((next_mark_target - tv.tv_sec) * 1000, 0); - } - else - { - *timeout = -1; + if(next_mark_target == -1) + { + next_mark_target = tv.tv_sec + self->mark_freq; + } + if (next_mark_target != -1) + { + *timeout = MAX((next_mark_target - tv.tv_sec) * 1000, 250); + } }
If timeout is negative or zero, we really need to emit a MARK message here. The root cause was that next_mark_target wasn't reinitialized when SIGHUP is received.
return !g_queue_is_empty(internal_msg_queue); } @@ -78,10 +75,12 @@ static gboolean afinter_source_check(GSource *source) { GTimeVal tv; + AFInterWatch *self = (AFInterWatch *) source;
g_source_get_current_time(source, &tv);
- if (next_mark_target != -1 && next_mark_target <= tv.tv_sec) + if (self->mark_freq > 0 && + next_mark_target != -1 && next_mark_target <= tv.tv_sec) return TRUE; return !g_queue_is_empty(internal_msg_queue); }
if next_mark_target becomes -1 when being reinitialized, this is not needed.
@@ -94,13 +93,15 @@ afinter_source_dispatch(GSource *source, LogMessage *msg; gint path_flags = 0; GTimeVal tv; - + AFInterWatch *self = (AFInterWatch *) source; + g_source_get_current_time(source, &tv);
- if (next_mark_target != -1 && next_mark_target <= tv.tv_sec) + if (self->mark_freq > 0 && next_mark_target != -1 && next_mark_target <= tv.tv_sec) { msg = log_msg_new_mark(); path_flags = PF_FLOW_CTL_OFF; + next_mark_target = tv.tv_sec + self->mark_freq; } else { @@ -154,9 +155,12 @@ afinter_source_init(LogPipe *s, GlobalCo AFInterSource *self = (AFInterSource *) s;
/* the source added below references this logreader, it will be unref'd - when the source is destroyed */ - self->watch = afinter_source_watch_new(&self->super.super, cfg->mark_freq); - g_source_attach(self->watch, NULL); + when the source is destroyed */ + if(cfg->mark_freq > 0) + { + self->watch = afinter_source_watch_new(&self->super.super, cfg->mark_freq); + g_source_attach(self->watch, NULL); + }
This breaks the internal() source. The watch must always be enabled, because it is also responsible for generating the callbacks needed for the internal source. Anyway, thanks for the proposed patch, you can probably trivially backport the fix I've posted to 2.0 as well. -- Bazsi
participants (3)
-
Balazs Scheidler
-
Ilas, Yann
-
Marius Tomaschewski