[Bug 166] New: syslog-ng abort when using pgsql
https://bugzilla.balabit.com/show_bug.cgi?id=166 Summary: syslog-ng abort when using pgsql Product: syslog-ng Version: 3.4.x Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: unspecified Component: syslog-ng AssignedTo: bazsi@balabit.hu ReportedBy: whille@163.com Type of the Report: --- Estimated Hours: 0.0 syslog-ng aborted after running a while, several minutes or within a hour. service syslog-ng will restart process, so i'm using just process to debug: /usr/local/sbin/syslog-ng -d -F > syslogdbg.log 2>&1 last log of syslog-ng itself: Incoming log entry; line='<118>1 2012-03-02T18:54:00+08:00 signal signalingd - - [meta sequenceId="287054"] {"device_ext_ip":"172.18.195.218","to":"407edfb0-cbd5-4b5b-be63-2b3b632b0cc1","app":"d160fca0-3793-11e1-b86c-0800200c9a66","event_id":771,"chunk_type":"conf","chunk_number":1,"event_src_ip":"110.132.142.102","event_dst_ip":"110.132.142.100","devicemac":"222DB98C1600","devicename":"DCS-940","from":"F07D68078020"}\x0a' Rewrite expression evaluation result; value='.SDATA.meta.sequenceId', new_value='4' Filter rule evaluation begins; filter_rule='f_prg' Filter node evaluation result; filter_result='match' Filter rule evaluation result; filter_result='match', filter_rule='f_prg' Filter rule evaluation begins; filter_rule='f_oq0' Filter node evaluation result; filter_result='not-match', filter_type='CMP' Filter rule evaluation result; filter_result='not-match', filter_rule='f_oq0' Filter rule evaluation begins; filter_rule='f_oq1' Filter node evaluation result; filter_result='not-match', filter_type='CMP' Filter rule evaluation result; filter_result='not-match', filter_rule='f_oq1' file logqueue-fifo.c: line 388 (log_queue_fifo_ack_backlog): assertion failed: (s->parallel_push_notify == NULL) syslog-ng -V syslog-ng 3.3.4 Installer-Version: 3.3.4 Revision: ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.3#master#5e44eb46b0d7b86b62f17698e2b6de875ac8d7c6 Compile-Date: Mar 5 2012 21:18:10 Default-Modules: affile,afprog,afsocket,afuser,basicfuncs,csvparser,dbparser,syslogformat,afsql Available-Modules: afmongodb,dbparser,afsocket-tls,affile,basicfuncs,csvparser,syslogformat,dummy,afsocket,convertfuncs,afprog,afuser,afsql,confgen Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-IPv6: on Enable-Spoof-Source: off Enable-TCP-Wrapper: on Enable-Linux-Caps: off Enable-Pcre: off -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=166 whille <whille@163.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |whille@163.com --- Comment #1 from whille <whille@163.com> 2012-03-07 12:25:24 --- anyone got any trace? i found many clues about the parallel_push_notify to debug. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=166 --- Comment #2 from Balazs Scheidler <bazsi@balabit.hu> 2012-03-18 14:00:27 --- Hmm.. can you check if this workaround fixes this for you? diff --git a/modules/afsql/afsql.c b/modules/afsql/afsql.c index d3c72c9..5c04978 100644 --- a/modules/afsql/afsql.c +++ b/modules/afsql/afsql.c @@ -569,6 +569,17 @@ afsql_dd_commit_txn(AFSqlDestDriver *self, gboolean lock) success = afsql_dd_run_query(self, "COMMIT", FALSE, NULL); if (lock) g_mutex_lock(self->db_thread_mutex); + + /* FIXME: this is a workaround because of the non-proper locking semantics + * of the LogQueue. It might happen that the _queue() method sees 0 + * elements in the queue, while the thread is still busy processing the + * previous message. In that case arming the parallel push callback is + * not needed and will cause assertions to fail. This is ugly and should + * be fixed by properly defining the "blocking" semantics of the LogQueue + * object w/o having to rely on user-code messing with parallel push + * callbacks. */ + + log_queue_reset_parallel_push(self->queue); if (success) { log_queue_ack_backlog(self->queue, self->flush_lines_queued); @@ -698,6 +709,15 @@ afsql_dd_insert_db(AFSqlDestDriver *self) else { g_mutex_lock(self->db_thread_mutex); + + /* FIXME: this is a workaround because of the non-proper locking semantics + * of the LogQueue. It might happen that the _queue() method sees 0 + * elements in the queue, while the thread is still busy processing the + * previous message. In that case arming the parallel push callback is + * not needed and will cause assertions to fail. This is ugly and should + * be fixed by properly defining the "blocking" semantics of the LogQueue + * object w/o having to rely on user-code messing with parallel push + * callbacks. */ log_queue_reset_parallel_push(self->queue); success = log_queue_pop_head(self->queue, &msg, &path_options, (self->flags & AFSQL_DDF_EXPLICIT_COMMITS), FALSE); g_mutex_unlock(self->db_thread_mutex); Thanks for letting me know. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=166 Balazs Scheidler <bazsi@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution| |FIXED Status|NEW |RESOLVED --- Comment #3 from Balazs Scheidler <bazsi@balabit.hu> 2012-03-31 20:22:48 --- any news on the patch perhaps? I'm committing, since I think it is safe to do so. I'd like to implement a better fix, but until that happens here's the commit: commit 35a092d9f8fc6545bbb72958fa117ffdffc1192a Author: Balazs Scheidler <bazsi@balabit.hu> Date: Sat Mar 31 20:22:30 2012 +0200 afsql: another reset_parallel_push workaround Lacking a better alternative right now, this patch works around an abort happening in the SQL destination driver when the target is slow. It's a single line of code, with 20 lines of comment... Reported-By: whille <whille@163.com> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=166 --- Comment #4 from whille <whille@163.com> 2012-04-01 00:33:47 --- Hi, i've just tested yesterday. the syslog-ng server did not restart again, but it hanged. I send 0.5M fake log, and after 0.5 hour, sitll not finished. i checked 1st pgsqlDB, the table is empty, but syslog-server no longer output log. I've tested the patch twice, same result. So i think this patch still has problems with pgsqlDB. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
participants (1)
-
bugzilla@bugzilla.balabit.com