Sent: Mon Feb 06 2012 19:28:36 GMT-0500 (EST) From: Gergely Nagy <algernon@balabit.hu> To: syslog-ng@lists.balabit.hu Subject: Re: [syslog-ng] syslog-ng hangs with high message volume to sqlite database
Patrick Hemmer<syslogng@stormcloud9.net> writes:
This looks interesting, and suspiciosly similar to something I saw before. I'll see if I can track it down.
By the way: "thread apply all backtrace full" is a very handy sequence to remember: it gets a full backtrace of all threads, so you don't have to switch between them and do a where each time.
Have you been able to make any progress on this? Nope, unfortunately I wasn't able to allocate time for this yet. But I'll look into it the next time I'm doing a batch of syslog-ng work (most probably this coming friday, perhaps earlier, if time permits).
I've got a cron job which is checking for this every few minutes and `kill -9`ing syslog-ng if it sees it. Seems like the entire system will hang if syslog-ng gets into this state. I'm guessing max pending socket connections (on /dev/log) is reached, or the already existing sockets fill up, or something. If I dont kill -9 syslog-ng before this happens I cant even log in and am forced to do a power reset on the box. Aye, that's pretty much the same symptoms I saw (except I saw it with mongodb, and the bug that was in there doesn't exist in afsql, so it's probably something similar, yet different). I think I may have solved this. It was driving me insane as the problem had gotten even worse. I couldnt go 5 minutes without syslog-ng hanging. When using flush_timeout, if there were messages pending commit, and the flush_timeout was reached, it wasnt releasing the lock before restarting the loop. And then immediately after the loop restarted it tried to get a lock, but since it hadnt released the last one, it just hung there. Now, I'm not sure if this is really it or not. It seems to have solved it, but its just seems a little bit too obvious, making me feel like I dont understand what the code is doing there. But as mentioned, it does seem to have solved the issue as I've gone a few hours now where before I couldnt go a few minutes.
--- syslog-ng-3.3.4.orig/modules/afsql/afsql.c 2011-11-12 07:48:47.000000000 -0500 +++ syslog-ng-3.3.4/modules/afsql/afsql.c 2012-02-09 23:25:02.544892824 -0500 @@ -890,6 +890,7 @@ { afsql_dd_disconnect(self); afsql_dd_suspend(self); + g_mutex_unlock(self->db_thread_mutex); continue; } } Note, this does not solve the issue I was getting where it would complain with "error='5: database is locked', query='COMMIT'". This still happens every now and then, but it does seem to recover eventually.