[syslog-ng] syslog-ng hangs with high message volume to sqlite database

Patrick Hemmer syslogng at stormcloud9.net
Tue Jan 31 05:41:17 CET 2012


Sent: Sat Jan 28 2012 23:11:08 GMT-0500 (EST)
From: Patrick Hemmer <syslogng at stormcloud9.net>
To: syslog-ng at lists.balabit.hu
Subject: [syslog-ng] syslog-ng hangs with high message volume to sqlite 
database
> I seem to have run into a situation where syslog-ng (3.3.4) will hang
> when sending a high message volume to an sqlite3 database.
> When I run the `loggen` utility with a high message rate, syslog-ng
> writes out the following error:
> Jan 28 22:58:12 storm.feystorm.net syslog-ng[22839]: Error running SQL
> query; type='sqlite3', host='', port='', user='syslog-ng',
> database='/syslogtest/logs.sql3', error='5: database is locked',
> query='COMMIT'
> Jan 28 22:58:12 storm.feystorm.net syslog-ng[22839]: SQL transaction
> commit failed, rewinding backlog and starting again;
>
> Syslog-ng will sit stuck like this for a minute and then log the error
> again. Sometimes it'll write a few more entries to the database before
> it errors, but its never more than a few. I'm not sure if this issue is
> a problem with libdbi or syslog-ng, which is why I'm here and not the
> bug tracker. I came across this issue trying to track down another issue
> where syslog-ng seems to hang completely and will never recover (wont
> respond to anything other than a `kill -9`). Whether this is the same
> issue or not, I do not know.
> I was able to trigger this with the following loggen command:
> loggen -r 60 -I 1 --active-connections=2 -x ./socket
> This server is a fairly low power server (its a central log collector
> for my home network), so it you might have to significantly bump up the
> message rate to duplicate the issue.
>
>
> Config:
> @version: 3.3
>
> options {
>       keep_hostname(yes);
>       chain_hostnames(no);
>       normalize_hostnames(yes);
>       use_dns(no);
>       use_fqdn(yes);
>       stats_freq(0);
>       mark_freq(0);
> };
>
> source s_socket {
>       unix-stream("/syslogtest/socket");
> };
> source s_internal {
>       internal();
> };
> destination d_internal {
>       file("/syslogtest/internal.log");
> };
> log { source(s_internal); destination(d_internal); };
>
> destination d_sqlite {
>       sql(
>           type('sqlite3')
>           database("/syslogtest/logs.sql3")
>           table("logs")
>           columns("time", "time_r", "host", "facility", "priority",
> "program", "pid", "tag", "message")
>           values("$S_UNIXTIME", "$R_UNIXTIME", "$FULLHOST",
> "$FACILITY_NUM", "$LEVEL_NUM", "$PROGRAM", "$PID", "$DBTAG", "$MSG")
>           null("")
>           flags(explicit-commits)
>           flush_lines(10)
>           flush_timeout(200)
>       );
> };
> filter f_test1 {
>       message('^prg00000') or program('syslog-ng');
> };
> filter f_test2 {
>       message('^prg00001') or program('syslog-ng');
> };
> log { source(s_internal); source(s_socket); destination(d_sqlite); };
> #log { source(s_internal); source(s_socket); filter(f_test1);
> destination(d_sqlite); };
> #log { source(s_internal); source(s_socket); filter(f_test2);
> destination(d_sqlite); };
>
>
>
>
>
> Database schema:
> CREATE TABLE logs (id integer primary key autoincrement, time unsigned
> integer not null, time_r unsigned integer not null, host text not null,
> facility unsigned integer not null, priority unsigned integer not null,
> program text not null, pid unsigned integer, tag text, message text not
> null);
> CREATE INDEX host on logs (host, time);
> CREATE INDEX program on logs (program, time);
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>

I finally got the original problem I was trying to track down to happen 
(had to wait for it after enabling debugging symbols). I'm not sure if 
theyre related, but they look like they are, so I'm going to attach it 
on to this one. The issue here is that syslog-ng hangs completely. 
Doesnt respond to anything other than `kill -9` (looks like 2 threads 
are in a deadlock).
Strace shows
Process 6979 attached - interrupt to quit
futex(0x349eb0d1b0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>

Where it just sits there forever.
Here's some gdb info:
(gdb) info threads
   Id   Target Id         Frame
   2    Thread 0x34711563700 (LWP 6986) "syslog-ng" 0x0000034710b6afc4 
in __lll_lock_wait () from /lib64/libpthread.so.0
* 1    Thread 0x34711566b00 (LWP 6979) "syslog-ng" 0x0000034710b6afc4 in 
__lll_lock_wait () from /lib64/libpthread.so.0
(gdb) where
#0  0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000034710b66459 in _L_lock_508 () from /lib64/libpthread.so.0
#2  0x0000034710b6627b in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000003470dc97039 in afsql_dd_queue (s=0x349eb408f0, 
msg=0x349eb4b430, path_options=0x3add5269ef0, user_data=0x0) at afsql.c:1159
#4  0x00000347110d597f in log_pipe_queue (s=0x349eb408f0, 
msg=0x349eb4b430, path_options=0x3add526a040) at logpipe.h:288
#5  0x00000347110d5e9d in log_dest_group_queue (s=0x349eb40f10, 
msg=0x349eb4b430, path_options=0x3add526a040, user_data=0x0) at dgroup.c:97
#6  0x00000347110df3bc in log_pipe_queue (s=0x349eb40f10, 
msg=0x349eb4b430, path_options=0x3add526a040) at logpipe.h:288
#7  0x00000347110df6d5 in log_multiplexer_queue (s=0x349eb084e0, 
msg=0x349eb4b430, path_options=0x3add526a3e0, user_data=0x0) at logmpx.c:125
#8  0x00000347110d7b20 in log_pipe_queue (s=0x349eb084e0, 
msg=0x349eb4b430, path_options=0x3add526a3e0) at logpipe.h:288
#9  0x00000347110d7a9f in log_pipe_forward_msg (self=0x349eb44390, 
msg=0x349eb4b430, path_options=0x3add526a3e0) at logpipe.h:275
#10 0x00000347110d979c in log_filter_pipe_queue (s=0x349eb44390, 
msg=0x349eb4b430, path_options=0x3add526a3e0, user_data=0x0) at filter.c:698
#11 0x00000347110ed3ec in log_pipe_queue (s=0x349eb44390, 
msg=0x349eb4b430, path_options=0x3add526a3e0) at logpipe.h:288
#12 0x00000347110ed36b in log_pipe_forward_msg (self=0x349eb458b0, 
msg=0x349eb4b430, path_options=0x3add526a3e0) at logpipe.h:275
#13 0x00000347110ed724 in log_rewrite_queue (s=0x349eb458b0, 
msg=0x349eb4b430, path_options=0x3add526a3e0, user_data=0x0) at 
logrewrite.c:65
#14 0x00000347110ed3ec in log_pipe_queue (s=0x349eb458b0, 
msg=0x349eb4b430, path_options=0x3add526a3e0) at logpipe.h:288
#15 0x00000347110ed36b in log_pipe_forward_msg (self=0x349eb45620, 
msg=0x349eb4b430, path_options=0x3add526a3e0) at logpipe.h:275
#16 0x00000347110ed724 in log_rewrite_queue (s=0x349eb45620, 
msg=0x349eb4b430, path_options=0x3add526a3e0, user_data=0x0) at 
logrewrite.c:65
#17 0x00000347110df3bc in log_pipe_queue (s=0x349eb45620, 
msg=0x349eb4b430, path_options=0x3add526a3e0) at logpipe.h:288
#18 0x00000347110df6d5 in log_multiplexer_queue (s=0x349eb07840, 
msg=0x34708013350, path_options=0x3add526a680, user_data=0x0) at 
logmpx.c:125
#19 0x00000347110ffb93 in log_pipe_queue (s=0x349eb07840, 
msg=0x34708013350, path_options=0x3add526a680) at logpipe.h:288
#20 0x00000347110ffb12 in log_pipe_forward_msg (self=0x349eb39df0, 
msg=0x34708013350, path_options=0x3add526a680) at logpipe.h:275
#21 0x000003471110012d in log_source_group_queue (s=0x349eb39df0, 
msg=0x34708013350, path_options=0x3add526a680, user_data=0x0) at 
sgroup.c:102
#22 0x00000347110ee3fe in log_pipe_queue (s=0x349eb39df0, 
msg=0x34708013350, path_options=0x3add526a680) at logpipe.h:288
#23 0x00000347110ee37d in log_pipe_forward_msg (self=0x349eb39d10, 
msg=0x34708013350, path_options=0x3add526a680) at logpipe.h:275
#24 0x00000347110ee417 in log_pipe_queue (s=0x349eb39d10, 
msg=0x34708013350, path_options=0x3add526a680) at logpipe.h:292
#25 0x00000347110ee37d in log_pipe_forward_msg (self=0x349eb3b220, 
msg=0x34708013350, path_options=0x3add526a680) at logpipe.h:275
#26 0x00000347110ef1d4 in log_source_queue (s=0x349eb3b220, 
msg=0x34708013350, path_options=0x3add526a750, user_data=0x0) at 
logsource.c:290
#27 0x00000347110cc454 in log_pipe_queue (s=0x349eb3b220, 
msg=0x34708013350, path_options=0x3add526a750) at logpipe.h:288
#28 0x00000347110cc718 in afinter_source_post (s=0x349eb3b220) at 
afinter.c:115
#29 0x000003471111e184 in iv_event_run_pending_events (_dummy=0x0) at 
iv_event.c:67
#30 0x000003471111e6a5 in iv_event_raw_got_event (_this=0x34711566978) 
at iv_event_raw.c:82
#31 0x0000034711120ebe in iv_run_active_list (active=0x3add526ac90) at 
iv_main.c:219
#32 0x00000347111210e8 in iv_main () at iv_main.c:269
#33 0x00000347110f67de in main_loop_run () at mainloop.c:731
#34 0x000000349e8f5d08 in main (argc=1, argv=0x3add526ae18) at main.c:260
(gdb) thread 2
[Switching to thread 2 (Thread 0x34711563700 (LWP 6986))]
#0  0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) where
#0  0x0000034710b6afc4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000034710b66459 in _L_lock_508 () from /lib64/libpthread.so.0
#2  0x0000034710b6627b in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000003470dc9603e in afsql_dd_database_thread (arg=0x349eb408f0) at 
afsql.c:863
#4  0x00000347110f87f5 in worker_thread_func (st=0x349eb0d1e0) at misc.c:623
#5  0x0000034710df7dc6 in ?? () from /usr/lib64/libglib-2.0.so.0
#6  0x0000034710b63b2a in start_thread () from /lib64/libpthread.so.0
#7  0x00000347108af71d in clone () from /lib64/libc.so.6



I have a gcore from this as well, so I can get any additional info you need.


More information about the syslog-ng mailing list