[syslog-ng] syslog-ng hangs with high message volume to sqlite database
Patrick Hemmer
syslogng at stormcloud9.net
Fri Feb 10 06:38:12 CET 2012
Sent: Mon Feb 06 2012 19:28:36 GMT-0500 (EST)
From: Gergely Nagy <algernon at balabit.hu>
To: syslog-ng at lists.balabit.hu
Subject: Re: [syslog-ng] syslog-ng hangs with high message volume to
sqlite database
> Patrick Hemmer<syslogng at stormcloud9.net> writes:
>
>>> This looks interesting, and suspiciosly similar to something I saw
>>> before. I'll see if I can track it down.
>>>
>>> By the way: "thread apply all backtrace full" is a very handy sequence
>>> to remember: it gets a full backtrace of all threads, so you don't have
>>> to switch between them and do a where each time.
>>>
>> Have you been able to make any progress on this?
> Nope, unfortunately I wasn't able to allocate time for this yet. But
> I'll look into it the next time I'm doing a batch of syslog-ng work
> (most probably this coming friday, perhaps earlier, if time permits).
>
>> I've got a cron job which is checking for this every few minutes and
>> `kill -9`ing syslog-ng if it sees it. Seems like the entire system will
>> hang if syslog-ng gets into this state. I'm guessing max pending socket
>> connections (on /dev/log) is reached, or the already existing sockets
>> fill up, or something. If I dont kill -9 syslog-ng before this happens I
>> cant even log in and am forced to do a power reset on the box.
> Aye, that's pretty much the same symptoms I saw (except I saw it with
> mongodb, and the bug that was in there doesn't exist in afsql, so it's
> probably something similar, yet different).
I think I may have solved this. It was driving me insane as the problem
had gotten even worse. I couldnt go 5 minutes without syslog-ng hanging.
When using flush_timeout, if there were messages pending commit, and the
flush_timeout was reached, it wasnt releasing the lock before restarting
the loop. And then immediately after the loop restarted it tried to get
a lock, but since it hadnt released the last one, it just hung there.
Now, I'm not sure if this is really it or not. It seems to have solved
it, but its just seems a little bit too obvious, making me feel like I
dont understand what the code is doing there. But as mentioned, it does
seem to have solved the issue as I've gone a few hours now where before
I couldnt go a few minutes.
--- syslog-ng-3.3.4.orig/modules/afsql/afsql.c 2011-11-12
07:48:47.000000000 -0500
+++ syslog-ng-3.3.4/modules/afsql/afsql.c 2012-02-09
23:25:02.544892824 -0500
@@ -890,6 +890,7 @@
{
afsql_dd_disconnect(self);
afsql_dd_suspend(self);
+ g_mutex_unlock(self->db_thread_mutex);
continue;
}
}
Note, this does not solve the issue I was getting where it would
complain with "error='5: database is locked', query='COMMIT'". This
still happens every now and then, but it does seem to recover eventually.
More information about the syslog-ng
mailing list