[syslog-ng] Strange behavior with the option "mark"
Balazs Scheidler
bazsi at balabit.hu
Fri Aug 6 13:39:31 CEST 2010
On Thu, 2010-08-05 at 22:05 +0200, Marius Tomaschewski wrote:
> Am Dienstag, 3. August 2010 16:01:39 schrieb Ilas, Yann:
> > Thank you for your answer.
> >
> > When I start the syslog-ng with the option "mark" sets to 0, no MARKS
> > messages are sent (as written in the OSE 3.1 Administrator Guide).
> > This problem appears only when the option "mark" is set from a value <>
> > 0 to a value = 0 and when I reload the configuration with a HUP signal.
>
> One of our customers run into this problem too, but it seems to be not easy
> to reproduce it clearly and it has been usually resolved as WORKSFORME.
>
> Your hint, that it is required to change from <>0 to ==0 + reload may help
> to reproduce it, thanks!
>
> See also:
> https://bugzilla.balabit.com/show_bug.cgi?id=10
> http://bugs.gentoo.org/228601
> https://bugzilla.novell.com/show_bug.cgi?id=617111
>
> I think it happens when one function fetching time gets another second
> value than the another one (just a change in tv_usec) and this statement
> results in a 0 timeout:
>
> *timeout = MAX((next_mark_target - tv.tv_sec) * 1000, 0);
>
> Changing it to
> *timeout = MAX((next_mark_target - tv.tv_sec) * 1000, 250);
>
> limits the number of the messages that may happen inside of 1 second
> when the event runs with timeout 0, what may fill /var partition with
> mark messages.
>
> The attached patch makes more than this (maybe a little bit too much).
> It also disables the watch function completely when mark_freq==0, what
> saves a lot of CPU ticks and makes sure, that the time is not fetched
> twice in one function with 2 compares.
>
> Balazs, please review it. Thanks!
This patch seems to solve the issue for me, I've pushed it to 3.2 OSE
branch, but can trivially be backported to earlier branches.
diff --git a/src/afinter.c b/src/afinter.c
index d631c3d..31435f7 100644
--- a/src/afinter.c
+++ b/src/afinter.c
@@ -154,6 +154,7 @@ afinter_source_watch_new(LogSource *afinter_source, gint mark_freq)
{
AFInterWatch *self = (AFInterWatch *) g_source_new(&afinter_source_watch_funcs, sizeof(AFInterWatch));
+ next_mark_target = -1;
self->mark_freq = mark_freq;
self->afinter_source = afinter_source;
g_source_set_callback(&self->super, (GSourceFunc) afinter_source_dispatch_msg, log_pipe_ref(&afinter_source->super), (GDestroyNotify) log_pipe_unref);
> --- syslog-ng-2.0.9/src/afinter.c
> +++ syslog-ng-2.0.9/src/afinter.c 2010/07/12 08:38:35
> @@ -56,20 +56,17 @@ afinter_source_prepare(GSource *source,
>
> *timeout = -1;
>
> - if (self->mark_freq > 0 && next_mark_target == -1)
> + if (self->mark_freq > 0)
> {
> g_source_get_current_time(source, &tv);
> - next_mark_target = tv.tv_sec + self->mark_freq;
> - }
> -
> - if (next_mark_target != -1)
> - {
> - g_source_get_current_time(source, &tv);
> - *timeout = MAX((next_mark_target - tv.tv_sec) * 1000, 0);
> - }
> - else
> - {
> - *timeout = -1;
> + if(next_mark_target == -1)
> + {
> + next_mark_target = tv.tv_sec + self->mark_freq;
> + }
> + if (next_mark_target != -1)
> + {
> + *timeout = MAX((next_mark_target - tv.tv_sec) * 1000, 250);
> + }
> }
If timeout is negative or zero, we really need to emit a MARK message
here. The root cause was that next_mark_target wasn't reinitialized when
SIGHUP is received.
>
> return !g_queue_is_empty(internal_msg_queue);
> }
> @@ -78,10 +75,12 @@ static gboolean
> afinter_source_check(GSource *source)
> {
> GTimeVal tv;
> + AFInterWatch *self = (AFInterWatch *) source;
>
> g_source_get_current_time(source, &tv);
>
> - if (next_mark_target != -1 && next_mark_target <= tv.tv_sec)
> + if (self->mark_freq > 0 &&
> + next_mark_target != -1 && next_mark_target <= tv.tv_sec)
> return TRUE;
> return !g_queue_is_empty(internal_msg_queue);
> }
if next_mark_target becomes -1 when being reinitialized, this is not
needed.
>
> @@ -94,13 +93,15 @@ afinter_source_dispatch(GSource *source,
> LogMessage *msg;
> gint path_flags = 0;
> GTimeVal tv;
> -
> + AFInterWatch *self = (AFInterWatch *) source;
> +
> g_source_get_current_time(source, &tv);
>
> - if (next_mark_target != -1 && next_mark_target <= tv.tv_sec)
> + if (self->mark_freq > 0 && next_mark_target != -1 &&
> next_mark_target <= tv.tv_sec)
> {
> msg = log_msg_new_mark();
> path_flags = PF_FLOW_CTL_OFF;
> + next_mark_target = tv.tv_sec + self->mark_freq;
> }
> else
> {
> @@ -154,9 +155,12 @@ afinter_source_init(LogPipe *s, GlobalCo
> AFInterSource *self = (AFInterSource *) s;
>
> /* the source added below references this logreader, it will be
> unref'd
> - when the source is destroyed */
> - self->watch = afinter_source_watch_new(&self->super.super,
> cfg->mark_freq);
> - g_source_attach(self->watch, NULL);
> + when the source is destroyed */
> + if(cfg->mark_freq > 0)
> + {
> + self->watch = afinter_source_watch_new(&self->super.super,
> cfg->mark_freq);
> + g_source_attach(self->watch, NULL);
> + }
This breaks the internal() source. The watch must always be enabled,
because it is also responsible for generating the callbacks needed for
the internal source.
Anyway, thanks for the proposed patch, you can probably trivially
backport the fix I've posted to 2.0 as well.
--
Bazsi
More information about the syslog-ng
mailing list