Ok, that all makes sense. So what happens if you instantiate two db-parser instances which both refer to the same patterndb XML file. That should work, right? Something like; parser p_db_1 { db-parser(file("patterndb.xml")); }; parser p_db_2 { db-parser(file("patterndb.xml")): }; log { source(s_first); parser(p_db_1); destination(d_first); }; log { source(s_second); parser(p_db_2); destination(d_second); }; On Fri, Dec 17, 2010 at 8:39 AM, Balazs Scheidler <bazsi@balabit.hu> wrote:
On Fri, 2010-12-17 at 14:51 +0100, Balazs Scheidler wrote:
On Thu, 2010-12-16 at 23:26 -0600, Martin Holste wrote:
Congrats! Go do some gambling because your karma has to be extremely positive at this point!
On Thu, Dec 16, 2010 at 10:32 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I think this patch might well fix it.
I tried it with the patch just now and it seemed to launch OK.
Matthew.
On Thu, Dec 16, 2010 at 08:20:44PM -0800, Matthew Hall wrote:
Untested patch:
--- /home/megahall/dbparser.c 2010-12-16 20:07:03.000000000 -0800 +++ syslog-ng-3.2.1/modules/dbparser/dbparser.c 2010-12-16 20:14:59.000000000 -0800 @@ -140,6 +140,11 @@ else { log_db_parser_reload_database(self); + /* XXX: mhall: repair corruption of persistent config */ + if (self->db) + { + cfg_persist_config_add(cfg, log_db_parser_format_persist_name(self), self->db, (GDestroyNotify) pattern_db_free, FALSE); + } }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self);
Matthew.
On Thu, Dec 16, 2010 at 08:06:43PM -0800, Matthew Hall wrote:
On Thu, Dec 16, 2010 at 04:55:09PM -0600, Martin Holste wrote: > Keep fighting the good fight, hopefully you'll get some hints soon. > You're well outside the range of my debugging-fu at this point. > > On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall@mhcomputing.net> wrote: > > Getting closer. > > > > The problem goes away when the XML pattern DB is disabled. The problem > > does not appear if the XML pattern db is used in one log {} per below. > > Once it is used in two log {} blocks, KABOOM! I'm going to try debugging > > the other half of this which writes to the persistent store to see if I > > can sort out what's breaking the write. For what it's worth, the problem > > shows up in both 32 and 64 bit. > > > > Matthew.
I think the problem could be found now. In the log_db_parser_deinit, there is a call to cfg_persist_config_add but there is no corresponding call in log_db_parser_init.
When the db_parser is referenced once in the config file, self->db is NULL, so log_db_parser_reload_database is called to create the right data structure. It's important to remember that this is going to set self->db_file_inode and self->db_file_mtime.
When the db_parser is referenced again in the config file, self->db should be non NULL because the db_parser was supposed to be persisted.
Bug 1) When we call cfg_persist_config_fetch but we get NULL again so we call log_db_parser_reload_database again.
Bug 2) (Unrelated to my issue) Even if we had stored the db_parser, if we call stat, we just copy the new inode and mtime, and do not reload the patterns. So configuration reloads will probably not refresh the pattern DB.
Now in my case when we go into log_db_parser_reload_database the second time, we have a check if the DB file exists. If no we have an error. Fair enough. But if yes, then we check if self->db_file_inode and self->sb_file_mtime have changed, or not.
If they have not changed, we return right away, without initializing the self->db. But we have already destroyed the valid self->db pointer from the first initialization, by replacing it with the retval from cfg_persist_config_fetch, which was NULL because the config was not persisted.
Now we check self->db again at the end of log_db_parser_init where we find it has become NULL. This we return a failing retval because we never suceeded in initializing the log_db_parser to a non-NULL value.
We pass this error many frames up the stack until we hit the "Error initializing message pipeline" in log_center_init.
This goes a few frames up, until we exit with an error code.
I think the patch would be adding code to log_db_parser_init and/or log_db_parser_reload_database, which calls cfg_persist_config_add. I am going to try this Monday since I'm off tomorrow and see what I get.
Could somebody else try making a config which references the EXACT SAME patterndb file twice in two log {} blocks and see if it blows up for them as well? I want to try to eliminate as many environment specific issues as possible. I described the config you need in my previous mail.
Regards, Matthew.
static gboolean log_db_parser_init(LogParser *s, GlobalConfig *cfg) { LogDBParser *self = (LogDBParser *) s;
self->db = cfg_persist_config_fetch(cfg, ***log_db_parser_format_persist_name***(self)); if (self->db) { struct stat st;
if (stat(self->db_file, &st) < 0) { msg_error("Error stating pattern database file, no automatic reload will be performed", evt_tag_str("error", g_strerror(errno)), NULL); } else { self->db_file_inode = st.st_ino; self->db_file_mtime = st.st_mtime; } } else { log_db_parser_reload_database(self); }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); return self->db != NULL; }
First of all thanks for tracking this down, and sorry that
1) there's a bug in the first place 2) I wasn't here to help out.
But anyway, reading through code is always a useful experience, you'll be able to resolve much more complex issues next time. :)
About the patch:
I'm afraid it is not the correct solution, and I don't see an easy way forward, since you're using db-parser() in a way that I planned to explicitly restrict, and in my 3.3 codebase there's an explicit check that the same db-parser() instance cannot be used multiple times, but that has not (yet) trickled to 3.2.
But maybe we can come up with a better solution than my "restriction", since I thought that it can be reasonably said that only one db-parser() instance is needed in a configuration.
But anyway, here's the root cause:
1) starting with 3.2 we do have correllation capabilities with db-parser(), which also sports a feature to generate new messages.
2) I intend these messages to "appear" from the db-parser() instance, e.g. they wouldn't be reposted from the internal() source as in 3.2, but would original from the db-parser() where it is referenced from a log path.
3) this means that a single db-parser() instance can have only _one_ outgoing side, or otherwise I wouldn't know where to post messages.
The reason the db-parser() saves something to the "persist" store is that it has to save the correllation state accross SIGHUPs, so that a reload doesn't empty your complete correllation state.
Using the same state in two independent db-parser() instances is probably not what you want.
I'll check you I could fix your use-case, but I need some thinking time.
I just wanted to drop this mail so that you know that I'm alive, I was just distracted by other things.
Also, I appreciate your time you spent on diagnosing this issue. If you spotted anything in the code that could have helped your diagnosis (e.g. missing code comments) I'd be happy to get some feedback.
Thanks.
And here comes the patch:
commit 5f25ce47ab57774f8fe8df17bc96a006e535cb53 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Fri Dec 17 15:38:00 2010 +0100
dbparser: don't initialize the same db-parser() instance multiple times
Even if the configuration references the same db-parser() from several locations.
This caused a difficult to understand config initialization problem in syslog-ng.
NOTE: the current behaviour of the db-parsers() correllation state is going to change in 3.3. In 3.2, the state is shared in this case, in 3.3 each reference will use a private, independent state.
Reported-By: Matthew Hall Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
I don't have a solution for 3.3 yet, but I've added an item to my todo list about that.
Thanks again for tracking this down.
-- Bazsi
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html