Sent: Tue Jan 31 2012 05:11:05 GMT-0500 (EST) From: Gergely Nagy <algernon@balabit.hu> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] syslog-ng hangs with high message volume to sqlite database
Patrick Hemmer<syslogng@stormcloud9.net> writes:
Here's some gdb info: (gdb) info threads Id Target Id Frame 2 Thread 0x34711563700 (LWP 6986) "syslog-ng" 0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0 * 1 Thread 0x34711566b00 (LWP 6979) "syslog-ng" 0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0 [...] (gdb) where #0 0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000034710b66459 in _L_lock_508 () from /lib64/libpthread.so.0 #2 0x0000034710b6627b in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x000003470dc97039 in afsql_dd_queue (s=0x349eb408f0, msg=0x349eb4b430, path_options=0x3add5269ef0, user_data=0x0) at afsql.c:1159 [...]
(gdb) thread 2 [Switching to thread 2 (Thread 0x34711563700 (LWP 6986))] #0 0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) where #0 0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000034710b66459 in _L_lock_508 () from /lib64/libpthread.so.0 #2 0x0000034710b6627b in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x000003470dc9603e in afsql_dd_database_thread (arg=0x349eb408f0) at afsql.c:863 #4 0x00000347110f87f5 in worker_thread_func (st=0x349eb0d1e0) at misc.c:623 #5 0x0000034710df7dc6 in ?? () from /usr/lib64/libglib-2.0.so.0 #6 0x0000034710b63b2a in start_thread () from /lib64/libpthread.so.0 #7 0x00000347108af71d in clone () from /lib64/libc.so.6 This looks interesting, and suspiciosly similar to something I saw before. I'll see if I can track it down.
By the way: "thread apply all backtrace full" is a very handy sequence to remember: it gets a full backtrace of all threads, so you don't have to switch between them and do a where each time.
Have you been able to make any progress on this? I've got a cron job which is checking for this every few minutes and `kill -9`ing syslog-ng if it sees it. Seems like the entire system will hang if syslog-ng gets into this state. I'm guessing max pending socket connections (on /dev/log) is reached, or the already existing sockets fill up, or something. If I dont kill -9 syslog-ng before this happens I cant even log in and am forced to do a power reset on the box. Thanks -Patrick