Re: [syslog-ng] syslog-ng hangs with high message volume to sqlite database

6 Feb 2012

      Sent: Tue Jan 31 2012 05:11:05 GMT-0500 (EST)
From: Gergely Nagy <algernon@balabit.hu>
To: Syslog-ng users' and developers' mailing list 
<syslog-ng@lists.balabit.hu>
Subject: Re: [syslog-ng] syslog-ng hangs with high message volume to 
sqlite database
...
Patrick Hemmer<syslogng@stormcloud9.net>  writes:
...
Here's some gdb info:
(gdb) info threads
    Id   Target Id         Frame
    2    Thread 0x34711563700 (LWP 6986) "syslog-ng" 0x0000034710b6afc4
in __lll_lock_wait () from /lib64/libpthread.so.0
* 1    Thread 0x34711566b00 (LWP 6979) "syslog-ng" 0x0000034710b6afc4 in
__lll_lock_wait () from /lib64/libpthread.so.0
[...]
(gdb) where
#0  0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000034710b66459 in _L_lock_508 () from /lib64/libpthread.so.0
#2  0x0000034710b6627b in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000003470dc97039 in afsql_dd_queue (s=0x349eb408f0,
msg=0x349eb4b430, path_options=0x3add5269ef0, user_data=0x0) at afsql.c:1159
[...]
...
(gdb) thread 2
[Switching to thread 2 (Thread 0x34711563700 (LWP 6986))]
#0  0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) where
#0  0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000034710b66459 in _L_lock_508 () from /lib64/libpthread.so.0
#2  0x0000034710b6627b in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000003470dc9603e in afsql_dd_database_thread (arg=0x349eb408f0) at
afsql.c:863
#4  0x00000347110f87f5 in worker_thread_func (st=0x349eb0d1e0) at misc.c:623
#5  0x0000034710df7dc6 in ?? () from /usr/lib64/libglib-2.0.so.0
#6  0x0000034710b63b2a in start_thread () from /lib64/libpthread.so.0
#7  0x00000347108af71d in clone () from /lib64/libc.so.6
This looks interesting, and suspiciosly similar to something I saw
before. I'll see if I can track it down.
By the way: "thread apply all backtrace full" is a very handy sequence
to remember: it gets a full backtrace of all threads, so you don't have
to switch between them and do a where each time.
Have you been able to make any progress on this?

I've got a cron job which is checking for this every few minutes and 
`kill -9`ing syslog-ng if it sees it. Seems like the entire system will 
hang if syslog-ng gets into this state. I'm guessing max pending socket 
connections (on /dev/log) is reached, or the already existing sockets 
fill up, or something. If I dont kill -9 syslog-ng before this happens I 
cant even log in and am forced to do a power reset on the box.

Thanks

-Patrick