[Bug 163] New: afmongo does not send log
https://bugzilla.balabit.com/show_bug.cgi?id=163 Summary: afmongo does not send log Product: syslog-ng Version: 3.4.x Platform: PC OS/Version: Linux Status: NEW Severity: critical Priority: unspecified Component: syslog-ng AssignedTo: bazsi@balabit.hu ReportedBy: whille@163.com Type of the Report: bug Estimated Hours: 0.0 i'm using syslog-ng to store mongo, about 2k logs/sec. About half an hour, no message sent out. trying to locate it, I set log_fifo_size(16), it's easy to reproduce the bug. i found codes related: afmongodb_dd_queue(LogPipe *s, LogMessage *msg, const LogPathOptions *path_options, gpointer user_data) { ... g_mutex_lock(self->queue_mutex); self->last_msg_stamp = cached_g_current_time_sec (); queue_was_empty = log_queue_get_length(self->queue) == 0; g_mutex_unlock(self->queue_mutex); log_queue_push_tail(self->queue, msg, path_options); g_mutex_lock(self->suspend_mutex); if (queue_was_empty && !self->writer_thread_suspended) { g_mutex_lock(self->queue_mutex); log_queue_set_parallel_push(self->queue, 1, afmongodb_dd_queue_notify, self, NULL); g_mutex_unlock(self->queue_mutex); } g_mutex_unlock(self->suspend_mutex); } since in afmongodb_worker_thread(){ ... g_mutex_unlock(self->suspend_mutex); g_mutex_lock(self->queue_mutex); if (log_queue_get_length(self->queue) == 0) { g_cond_wait(self->writer_thread_wakeup_cond, self->queue_mutex); } g_mutex_unlock(self->queue_mutex); ... } the whole rough process i think is, worker wait fo afmongodb_dd_queue_notify(), if 0 queue. But in multi-thread race condition, queue_was_empty offend simplicity. Here's a situation: afmongodb_dd_queue() afmongodb_dd_queue() afmongodb_dd_queue() { // worker thread afmongodb_worker_thread{ whille(){... // run in a while circle afmongodb_worker_thread{ whille(){... afmongodb_worker_thread{ whille(){... all queue are empty now, g_cond_wait(self->writer_thread_wakeup_cond, self->queue_mutex); log_queue_push_tail queue_was_empty is FALSE, so afmongodb_dd_queue_notify is not called. afmongodb_dd_queue() // a queue has been put to tail last time, so queue_was_empty will be still FALSE. i'm trying to get queue length later and judge it as <=1, instead of ==0, for a queue just put to tail. wonder if it work, though a bit ugly. afmongodb_dd_queue(){ ... if(!self->writer_thread_suspended){ g_mutex_lock(self->queue_mutex); if (log_queue_get_length(self->queue)<=1){ log_queue_set_parallel_push(self->queue, 1, afmongodb_dd_queue_notify, self, NULL); } g_mutex_unlock(self->queue_mutex); } -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=163 --- Comment #1 from whille <whille@163.com> 2012-02-26 02:49:45 --- Created an attachment (id=49) --> (https://bugzilla.balabit.com/attachment.cgi?id=49) temp patch i've tested my patch, it can resolve the bug, though a bit rough. Please review. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=163 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |algernon@balabit.hu AssignedTo|bazsi@balabit.hu |algernon@balabit.hu -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=163 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=163 --- Comment #2 from Gergely Nagy <algernon@balabit.hu> 2012-02-27 08:08:13 --- Would it be possible to generate a unified diff for the patch? Would make it much easier to apply (diff -u originalfile newfile) The patch itself does look interesting, I'll see what I can do about it. Thanks for the patch & the report! -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=163 --- Comment #3 from whille <whille@163.com> 2012-02-27 08:40:59 --- Created an attachment (id=50) --> (https://bugzilla.balabit.com/attachment.cgi?id=50) diff -u patch -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=163 whille <whille@163.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |whille@163.com --- Comment #4 from whille <whille@163.com> 2012-02-27 08:42:02 --- sorry, i forgot diff -u to generate the patch -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=163 Balazs Scheidler <bazsi@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bazsi@balabit.hu --- Comment #5 from Balazs Scheidler <bazsi@balabit.hu> 2012-03-16 13:33:46 --- Gergely, can you give an update on this? I don't know if I can simply apply the patch, or you'd like to comment on that. Thanks. PS: I think we'll need to get back to refactoring queueing support for blocking destination drivers (such as the mongodb, sql and SMTP destinations), the same crop of bugs show up in each, because of the duplication of code. I'm not sure when I can look at that though. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=163 --- Comment #6 from Gergely Nagy <algernon@balabit.hu> 2012-03-23 13:49:01 --- (In reply to comment #5)
Gergely, can you give an update on this? I don't know if I can simply apply the patch, or you'd like to comment on that.
I finally had a little time to look at this issue, and the patch looks solid, and might even fix another issue I'm supposed to hunt down. I'll send a patch to the list, so that we have a nice commit message too. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=163 --- Comment #7 from Gergely Nagy <algernon@balabit.hu> 2012-03-23 14:17:45 --- (In reply to comment #6)
(In reply to comment #5)
Gergely, can you give an update on this? I don't know if I can simply apply the patch, or you'd like to comment on that.
I finally had a little time to look at this issue, and the patch looks solid, and might even fix another issue I'm supposed to hunt down.
I'll send a patch to the list, so that we have a nice commit message too.
Actually, there's a problem: the patch either doesn't fix the issue, only by luck, or I found something else, but I managed to deadlock the driver. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=163 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution| |FIXED Status|ASSIGNED |RESOLVED --- Comment #8 from Gergely Nagy <algernon@balabit.hu> 2012-04-20 13:51:23 --- This is supposed to be fixed in 3.3.5 (and in 3.4 git) - I can't reproduce the deadlock anymore. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
participants (1)
-
bugzilla@bugzilla.balabit.com