[syslog-ng] load failures in afsocket and afsql

Martin Holste mcholste at gmail.com
Fri Dec 17 16:18:04 CET 2010


Ok, that all makes sense.  So what happens if you instantiate two
db-parser instances which both refer to the same patterndb XML file.
That should work, right?  Something like;

parser p_db_1 { db-parser(file("patterndb.xml")); };
parser p_db_2 { db-parser(file("patterndb.xml")): };
log { source(s_first); parser(p_db_1); destination(d_first); };
log { source(s_second); parser(p_db_2); destination(d_second); };


On Fri, Dec 17, 2010 at 8:39 AM, Balazs Scheidler <bazsi at balabit.hu> wrote:
> On Fri, 2010-12-17 at 14:51 +0100, Balazs Scheidler wrote:
>> On Thu, 2010-12-16 at 23:26 -0600, Martin Holste wrote:
>> > Congrats!  Go do some gambling because your karma has to be extremely
>> > positive at this point!
>> >
>> > On Thu, Dec 16, 2010 at 10:32 PM, Matthew Hall <mhall at mhcomputing.net> wrote:
>> > > I think this patch might well fix it.
>> > >
>> > > I tried it with the patch just now and it seemed to launch OK.
>> > >
>> > > Matthew.
>> > >
>> > > On Thu, Dec 16, 2010 at 08:20:44PM -0800, Matthew Hall wrote:
>> > >> Untested patch:
>> > >>
>> > >> --- /home/megahall/dbparser.c 2010-12-16 20:07:03.000000000 -0800
>> > >> +++ syslog-ng-3.2.1/modules/dbparser/dbparser.c       2010-12-16 20:14:59.000000000 -0800
>> > >> @@ -140,6 +140,11 @@
>> > >>    else
>> > >>      {
>> > >>        log_db_parser_reload_database(self);
>> > >> +      /* XXX: mhall: repair corruption of persistent config */
>> > >> +      if (self->db)
>> > >> +        {
>> > >> +          cfg_persist_config_add(cfg, log_db_parser_format_persist_name(self), self->db, (GDestroyNotify) pattern_db_free, FALSE);
>> > >> +        }
>> > >>      }
>> > >>
>> > >>    self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self);
>> > >>
>> > >> Matthew.
>> > >>
>> > >> On Thu, Dec 16, 2010 at 08:06:43PM -0800, Matthew Hall wrote:
>> > >> > On Thu, Dec 16, 2010 at 04:55:09PM -0600, Martin Holste wrote:
>> > >> > > Keep fighting the good fight, hopefully you'll get some hints soon.
>> > >> > > You're well outside the range of my debugging-fu at this point.
>> > >> > >
>> > >> > > On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall at mhcomputing.net> wrote:
>> > >> > > > Getting closer.
>> > >> > > >
>> > >> > > > The problem goes away when the XML pattern DB is disabled. The problem
>> > >> > > > does not appear if the XML pattern db is used in one log {} per below.
>> > >> > > > Once it is used in two log {} blocks, KABOOM! I'm going to try debugging
>> > >> > > > the other half of this which writes to the persistent store to see if I
>> > >> > > > can sort out what's breaking the write. For what it's worth, the problem
>> > >> > > > shows up in both 32 and 64 bit.
>> > >> > > >
>> > >> > > > Matthew.
>> > >> >
>> > >> > I think the problem could be found now. In the log_db_parser_deinit,
>> > >> > there is a call to cfg_persist_config_add but there is no corresponding
>> > >> > call in log_db_parser_init.
>> > >> >
>> > >> > When the db_parser is referenced once in the config file, self->db is
>> > >> > NULL, so log_db_parser_reload_database is called to create the right
>> > >> > data structure. It's important to remember that this is going to set
>> > >> > self->db_file_inode and self->db_file_mtime.
>> > >> >
>> > >> > When the db_parser is referenced again in the config file, self->db
>> > >> > should be non NULL because the db_parser was supposed to be persisted.
>> > >> >
>> > >> > Bug 1) When we call cfg_persist_config_fetch but we get NULL again so we
>> > >> > call log_db_parser_reload_database again.
>> > >> >
>> > >> > Bug 2) (Unrelated to my issue) Even if we had stored the db_parser, if
>> > >> > we call stat, we just copy the new inode and mtime, and do not reload
>> > >> > the patterns. So configuration reloads will probably not refresh the
>> > >> > pattern DB.
>> > >> >
>> > >> > Now in my case when we go into log_db_parser_reload_database the second
>> > >> > time, we have a check if the DB file exists. If no we have an error.
>> > >> > Fair enough. But if yes, then we check if self->db_file_inode and
>> > >> > self->sb_file_mtime have changed, or not.
>> > >> >
>> > >> > If they have not changed, we return right away, without initializing the
>> > >> > self->db. But we have already destroyed the valid self->db pointer from
>> > >> > the first initialization, by replacing it with the retval from
>> > >> > cfg_persist_config_fetch, which was NULL because the config was not
>> > >> > persisted.
>> > >> >
>> > >> > Now we check self->db again at the end of log_db_parser_init where we
>> > >> > find it has become NULL. This we return a failing retval because we
>> > >> > never suceeded in initializing the log_db_parser to a non-NULL value.
>> > >> >
>> > >> > We pass this error many frames up the stack until we hit the "Error
>> > >> > initializing message pipeline" in log_center_init.
>> > >> >
>> > >> > This goes a few frames up, until we exit with an error code.
>> > >> >
>> > >> > I think the patch would be adding code to log_db_parser_init and/or
>> > >> > log_db_parser_reload_database, which calls cfg_persist_config_add. I am
>> > >> > going to try this Monday since I'm off tomorrow and see what I get.
>> > >> >
>> > >> > Could somebody else try making a config which references the EXACT SAME
>> > >> > patterndb file twice in two log {} blocks and see if it blows up for
>> > >> > them as well? I want to try to eliminate as many environment specific
>> > >> > issues as possible. I described the config you need in my previous mail.
>> > >> >
>> > >> > Regards,
>> > >> > Matthew.
>> > >> >
>> > >> > static gboolean log_db_parser_init(LogParser *s, GlobalConfig *cfg)
>> > >> > {
>> > >> >   LogDBParser *self = (LogDBParser *) s;
>> > >> >
>> > >> >   self->db = cfg_persist_config_fetch(cfg, ***log_db_parser_format_persist_name***(self));
>> > >> >   if (self->db)
>> > >> >     {
>> > >> >       struct stat st;
>> > >> >
>> > >> >       if (stat(self->db_file, &st) < 0)
>> > >> >         {
>> > >> >           msg_error("Error stating pattern database file, no automatic reload will be performed",
>> > >> >                     evt_tag_str("error", g_strerror(errno)),
>> > >> >                     NULL);
>> > >> >         }
>> > >> >       else
>> > >> >         {
>> > >> >           self->db_file_inode = st.st_ino;
>> > >> >           self->db_file_mtime = st.st_mtime;
>> > >> >         }
>> > >> >     }
>> > >> >   else
>> > >> >     {
>> > >> >       log_db_parser_reload_database(self);
>> > >> >     }
>> > >> >
>> > >> >   self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self);
>> > >> >   return self->db != NULL;
>> > >> > }
>> > >> >
>>
>> First of all thanks for tracking this down, and sorry that
>>
>> 1) there's a bug in the first place
>> 2) I wasn't here to help out.
>>
>> But anyway, reading through code is always a useful experience, you'll
>> be able to resolve much more complex issues next time. :)
>>
>> About the patch:
>>
>> I'm afraid it is not the correct solution, and I don't see an easy way
>> forward, since you're using db-parser() in a way that I planned to
>> explicitly restrict, and in my 3.3 codebase there's an explicit check
>> that the same db-parser() instance cannot be used multiple times, but
>> that has not (yet) trickled to 3.2.
>>
>> But maybe we can come up with a better solution than my "restriction",
>> since I thought that it can be reasonably said that only one db-parser()
>> instance is needed in a configuration.
>>
>> But anyway, here's the root cause:
>>
>> 1) starting with 3.2 we do have correllation capabilities with
>> db-parser(), which also sports a feature to generate new messages.
>>
>> 2) I intend these messages to "appear" from the db-parser() instance,
>> e.g. they wouldn't be reposted from the internal() source as in 3.2, but
>> would original from the db-parser() where it is referenced from a log
>> path.
>>
>> 3) this means that a single db-parser() instance can have only _one_
>> outgoing side, or otherwise I wouldn't know where to post messages.
>>
>> The reason the db-parser() saves something to the "persist" store is
>> that it has to save the correllation state accross SIGHUPs, so that a
>> reload doesn't empty your complete correllation state.
>>
>> Using the same state in two independent db-parser() instances is
>> probably not what you want.
>>
>> I'll check you I could fix your use-case, but I need some thinking time.
>>
>> I just wanted to drop this mail so that you know that I'm alive, I was
>> just distracted by other things.
>>
>> Also, I appreciate your time you spent on diagnosing this issue. If you
>> spotted anything in the code that could have helped your diagnosis (e.g.
>> missing code comments) I'd be happy to get some feedback.
>>
>> Thanks.
>>
>
> And here comes the patch:
>
> commit 5f25ce47ab57774f8fe8df17bc96a006e535cb53
> Author: Balazs Scheidler <bazsi at balabit.hu>
> Date:   Fri Dec 17 15:38:00 2010 +0100
>
>    dbparser: don't initialize the same db-parser() instance multiple times
>
>    Even if the configuration references the same db-parser() from several
>    locations.
>
>    This caused a difficult to understand config initialization problem in
>    syslog-ng.
>
>    NOTE: the current behaviour of the db-parsers() correllation state is going
>    to change in 3.3. In 3.2, the state is shared in this case, in 3.3 each
>    reference will use a private, independent state.
>
>    Reported-By: Matthew Hall
>    Signed-off-by: Balazs Scheidler <bazsi at balabit.hu>
>
>
> I don't have a solution for 3.3 yet, but I've added an item to my todo
> list about that.
>
> Thanks again for tracking this down.
>
> --
> Bazsi
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>


More information about the syslog-ng mailing list