Known memory leak issues in 3.3.4?
After considerable effort, I managed to get 3.3.4 built (using gcc 4.4.2, along with glib 2.29.92 and the requisite eventlog 0.2.12). The problem is that the resulting syslog-ng component grows continuously (at an alarming rate). Whereas my 3.0.5 syslog-ng process starts out running in just under 3MB, and then stabilizes at about 3.3MB, the 3.3.4 version starts out small, and then at the point where I killed it (about 2 hours after starting it), it had already grown to 277MB. I'm doing nothing fancy here (config shown below). No data transformations, no use of time normalization features, no DB components. In short, I'm not trying to use any of the many new/elaborate features that have been added over the past few years. I'm just "eating log input", writing it to handful of files and pipes, just like I've been doing since the syslog-ng 2.x days. As one of the regular problems that I see mentioned in various threads is "some new fix for a memory leak", I'm just trying to see if there's some known problem in the virgin 3.3.4 code, that might be causing my problems. (I.E. should I wait for 3.3.5, and try again?) FYI, my environment is Solaris 10 x86. Aside from having moved to the newer gcc compiler (in order to even attempt to compile the latest syslog-ng version), the only other major component change was the newer version of glib, which as I recall, was also necessary to resolve compilation issues with the new syslog-ng versions. Anyway... before I beat my head against the wall trying to figure out what to do next, I figured that I'd ask the easy question (about "known bugs") first, so that I don't spend any time chasing a problem that may already be fixed. (I did dig thru the mailing list, and saw at least one mention of a memory leak in 3.3.4, but it seemed to be related to a specific bit of functionality, that I didn't think applied in my situation.) (And just a side-comment, FWIW, I noticed that when I swapped the 3.3.4 version, for the 3.0.5 version, the host system immediately incurred what I estimate to be at least a 20% high CPU load that had been required by the 3.0.5 version. And when I reverted back to 3.0.5 a few hours later, I saw that happen in reverse. I'm not sure why the newest version requires more resources, again, when I'm not asking it to do anything new, but it definitely seems that it does.) Thanks for your time and input. Existing config file shown below. Note that other than the version change, the only other modification (from the file that I was using in 3.0.5) relates to the threading changes, which I made based upon Bazsi's recommendations in an email on 3/18, about how to avoid potential data loss on inbound UDP packets. Again, thanks for any help/input: @version:3.3 options { dir_perm(0755); perm(0640); group(wheel); chain_hostnames(no); keep_hostname(yes); log_fifo_size(41000); threaded(no); dns_cache_size(5000); dns_cache_expire(86400); dns_cache_expire_failed(86400); }; source any_udp { udp(flags(store-legacy-msghdr)); }; source any_tcp { tcp(port(601) max-connections(40) flags("store-legacy-msghdr", "threaded") use_dns(no)); }; destination SEC {pipe("/tmp/sec"); }; destination WIN-PIPE {pipe("/tmp/windows"); }; destination routers_log { file("/var/adm/log/routers.log" create_dirs(yes) flags("threaded")); }; destination ravlin_log { file("/var/adm/log/ravlin.log" create_dirs(yes) flags("threaded")); }; destination windows_log { file("/var/adm/log/windows.log" create_dirs(yes) flags("threaded")); }; destination workstation_log { file("/var/adm/log/workstation.log" create_dirs(yes) flags("threaded")); }; destination catch-all_log { file("/var/adm/log/catch-all.log" create_dirs(yes) flags("threaded")); }; destination test { file("/var/adm/log/test.log" create_dirs(yes) flags("threaded")); }; destination dev_null {}; destination secdevedc01 { udp("10.132.193.245" spoof_source(yes) flags("threaded")); }; destination engedc01 { udp("10.132.192.199" spoof_source(yes) flags("threaded")); }; destination egssiem { udp("10.10.36.157" spoof_source(yes) flags("threaded")); }; log { source(any_udp); destination(SEC); }; log { source(any_udp); destination(egssiem); }; filter f_4 { facility(syslog) and level(info..emerg); }; log { source(any_udp); filter(f_4); destination(WIN-PIPE); }; log { source(any_udp); filter(f_4); destination(windows_log); flags(final); }; filter f_1_acl { facility(local5) and level(debug..emerg) and match("SEC-6-IPACCESSLOG" value("MESSAGE")) ; }; filter f_1 { facility(local5) and level(debug..emerg); }; log { source(any_udp); filter(f_1_acl); destination(routers_log); flags(final); }; log { source(any_udp); filter(f_1); destination(engedc01); }; log { source(any_udp); filter(f_1); destination(routers_log); flags(final); }; filter SonicWallNoise { match("id=firewall" value("MESSAGE")) and filter(SonicWallMsgs); }; filter SonicWallMsgs { match("m=97" value("MESSAGE")) or match("m=98" value("MESSAGE")) or match("m=537" value("MESSAGE")); }; log { source(any_udp); filter(SonicWallNoise); destination(dev_null); flags(final); }; filter f_3 { facility(local0) and level(debug..emerg); }; log { source(any_udp); filter(f_3); destination(ravlin_log); flags(final); }; log { source(any_tcp); destination(workstation_log); flags(final); }; log { source(any_udp); destination(catch-all_log); flags(final); }; Marvin Nipper Director of Security Stream Global Services mailto:marvin.nipper@stream.com ph: (303) 670-2705 PGP Key ID: 0x8EE28551 (DSS/DH) 8C5D 403A D107 0A95 672B B637 BCF1 919A 8EE2 8551 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
On Thursday 22 of March 2012, Marvin Nipper wrote:
After considerable effort, I managed to get 3.3.4 built (using gcc 4.4.2, along with glib 2.29.92 and the requisite eventlog 0.2.12). The problem is that the resulting syslog-ng component grows continuously (at an alarming rate). Whereas my 3.0.5 syslog-ng process starts out running in just under 3MB, and then stabilizes at about 3.3MB, the 3.3.4 version starts out small, and then at the point where I killed it (about 2 hours after starting it), it had already grown to 277MB.
I know at least about one memory leak that has been once fixed, but then reintroduced between 3.3.2 and 3.3.3, and I think is present since then. I've mentioned it on this list in February: https://lists.balabit.hu/pipermail/syslog-ng/2012-February/018334.html It would be really nice if developers could take a look at the code again. :) Cheers, -- Jakub Jankowski|shasta@toxcorp.com|http://toxcorp.com/ GPG: FCBF F03D 9ADB B768 8B92 BB52 0341 9037 A875 942D
Hello, On 03/22/2012 02:46 PM, Jakub Jankowski wrote:
On Thursday 22 of March 2012, Marvin Nipper wrote:
After considerable effort, I managed to get 3.3.4 built (using gcc 4.4.2, along with glib 2.29.92 and the requisite eventlog 0.2.12). The problem is that the resulting syslog-ng component grows continuously (at an alarming rate). Whereas my 3.0.5 syslog-ng process starts out running in just under 3MB, and then stabilizes at about 3.3MB, the 3.3.4 version starts out small, and then at the point where I killed it (about 2 hours after starting it), it had already grown to 277MB. I know at least about one memory leak that has been once fixed, but then reintroduced between 3.3.2 and 3.3.3, and I think is present since then. I've mentioned it on this list in February: https://lists.balabit.hu/pipermail/syslog-ng/2012-February/018334.html
It would be really nice if developers could take a look at the code again. :) Check this: https://bugzilla.balabit.com/show_bug.cgi?id=160 The mentioned patch is available at http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commit;h=3c14a264ab4c76b9... Bye,
-- Peter Czanik (CzP)<czanik@balabit.hu> BalaBit IT Security / syslog-ng upstream http://czanik.blogs.balabit.com/
And just like "magic", the universe looks normal again..... THANKS SO MUCH for the SUPER QUICK response. That seems to be the exact fix that I needed. Barring any other anomalies, it looks like I may have finally managed to make my way out of the 3.0.5 world. Have a great week! -----Original Message----- From: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] On Behalf Of Peter Czanik Sent: Thursday, March 22, 2012 7:56 AM To: Syslog-ng users' and developers' mailing list Subject: Re: [syslog-ng] Known memory leak issues in 3.3.4? Hello, On 03/22/2012 02:46 PM, Jakub Jankowski wrote:
On Thursday 22 of March 2012, Marvin Nipper wrote:
After considerable effort, I managed to get 3.3.4 built (using gcc 4.4.2, along with glib 2.29.92 and the requisite eventlog 0.2.12). The problem is that the resulting syslog-ng component grows continuously (at an alarming rate). Whereas my 3.0.5 syslog-ng process starts out running in just under 3MB, and then stabilizes at about 3.3MB, the 3.3.4 version starts out small, and then at the point where I killed it (about 2 hours after starting it), it had already grown to 277MB. I know at least about one memory leak that has been once fixed, but then reintroduced between 3.3.2 and 3.3.3, and I think is present since then. I've mentioned it on this list in February: https://lists.balabit.hu/pipermail/syslog-ng/2012-February/018334.html
It would be really nice if developers could take a look at the code again. :) Check this: https://bugzilla.balabit.com/show_bug.cgi?id=160 The mentioned patch is available at http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commit;h=3c14a264ab4c76b9... Bye,
-- Peter Czanik (CzP)<czanik@balabit.hu> BalaBit IT Security / syslog-ng upstream http://czanik.blogs.balabit.com/ ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
On Thursday 22 of March 2012, Peter Czanik wrote:
On 03/22/2012 02:46 PM, Jakub Jankowski wrote:
I know at least about one memory leak that has been once fixed, but then reintroduced between 3.3.2 and 3.3.3, and I think is present since then. I've mentioned it on this list in February: https://lists.balabit.hu/pipermail/syslog-ng/2012-February/018334.html
It would be really nice if developers could take a look at the code again. :)
Check this: https://bugzilla.balabit.com/show_bug.cgi?id=160 The mentioned patch is available at http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commit;h=3c14a264ab4c76b9...
Unfortunately, this doesn't fix the memleak I'm seeing. I've just recreated my testbed and tried current 3.3 HEAD (using Gergely's tarball as source: http://packages.madhouse-project.org/syslog-ng/3.3/3.3.4/syslog-ng-3.3.4-201...) and can confirm "my" leak is still happening. Below are steps to reproduce it: Client is a loggen, started as: $ loggen --read-file access.random.10000.munged --active-connections 1 --interval 600 --syslog-proto <SERVERIP> 514 where access.random.10000.munged is a 10000-line file containing munged fragments of Apache accesslog: $ head access.random.10000.munged Sep 15 23:59:56 www142 accesslog: exyrqpxwva a.b.c.d - - [15/Sep/2011:23:59:56 +0200] Sep 15 23:59:57 www142 accesslog: mnufil a.b.c.d - - [15/Sep/2011:23:59:57 +0200] Sep 15 23:59:57 www142 accesslog: bvbgdknlvxibais a.b.c.d - - [15/Sep/2011:23:59:57 +0200] Sep 15 23:59:57 www83 accesslog: ttmpmlgf a.b.c.d - - [15/Sep/2011:23:59:57 +0200] Sep 15 23:59:57 www146 accesslog: igncmpwudj a.b.c.d - - [15/Sep/2011:23:59:57 +0200] Sep 15 23:59:57 www146 accesslog: pjecagrmuywulbro a.b.c.d - - [15/Sep/2011:23:59:57 +0200] Sep 15 23:59:57 www147 accesslog: ghjwjejsetvq a.b.c.d - - [15/Sep/2011:23:59:57 +0200] Sep 15 23:59:57 www22 accesslog: zcnalgemoowpunhtj a.b.c.d - - [15/Sep/2011:23:59:57 +0200] Sep 15 23:59:57 www146 accesslog: houclrfiukhbfofueill a.b.c.d - - [15/Sep/2011:23:59:57 +0200] Sep 15 23:59:57 www146 accesslog: kwqplgsulzkryhdnr a.b.c.d - - [15/Sep/2011:23:59:57 +0200] $ The most important thing are the random strings (which in real scenario are vhosts names). Destination filenames on server are based upon those strings. Server is configured like this: $ cat /etc/syslog-ng/syslog-ng.conf @version: 3.3 options { time_reap (3); log_fifo_size (100000); use_dns (no); use_fqdn (no); create_dirs (yes); keep_hostname (yes); log_msg_size(16384); # doesnt matter, with threaded(yes) it fails aswell threaded (no); owner ("root"); group ("root"); perm (0644); dir_owner ("root"); dir_group ("root"); dir_perm (0755); }; source s_web { syslog(ip(0.0.0.0) port(514) flags(no-multi-line) log_iw_size(6000) max-connections(30)); }; parser p_apache_split { csv-parser( columns("APACHE.ACCESS.VHOST", "APACHE.ACCESS.LOG_MESSAGE") flags(greedy) delimiters(" ") ); }; destination d_test { file("/logs/${APACHE.ACCESS.VHOST}-${R_YEAR}${R_MONTH}${R_DAY}" template("${APACHE.ACCESS.LOG_MESSAGE}\n")); }; log { source(s_web); parser(p_apache_split); destination(d_test); flags(final); }; $ Running it under valgrind like this: # G_SLICE=always-malloc valgrind --log-file=s3.3.4-fd3b-1.log \ --leak-check=full -v --show-reachable=yes --track-origins=yes \ syslog-ng -F --no-caps -p /var/run/syslog-ng.pid produces following log file: http://toxcorp.com/stuff/syslog-ng-leak/s3.3.4-fd3b-1.log I'm pretty sure this particular memory leak was fixed in 3.3.2, but then got reintroduced in this commit: http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commitdiff;h=c7070e2a6f1c... which was done to fix tcp() destination related crash after 3.3.2 was released. So, Bazsi, yes - it does reintroduce the leak :) (in reply to your comment in https://lists.balabit.hu/pipermail/syslog-ng/2011-November/017697.html ) Apart from the leak you can see in valgrind logfile, there is another symptom - some destination files are created empty: $ find /logs/ -type f | wc -l 10000 $ find /logs/ -type f -size 0 | wc -l 4410 $ find /logs/ -type f -size 0 | head -n 1 /logs/hcxjscbcnyoktilm-20120325 Even though there is a perfectly valid source record for them: $ grep hcxjscbcnyoktilm access.random.10000.munged Sep 16 00:00:05 www147 accesslog: hcxjscbcnyoktilm a.b.c.d - - [16/Sep/2011:00:00:05 +0200] $ The number of empty files corresponds to the number of valgrind reported errors: ==17931== 4410 errors in context 1 of 6: ==17931== Invalid read of size 8 ==17931== at 0x4E4BA05: log_dest_driver_deinit_method (driver.c:172) ==17931== by 0x6CFE559: affile_dd_deinit (affile.c:1001) ==17931== by 0x4E4AD33: log_dest_group_deinit (logpipe.h:254) ==17931== by 0x4E4650A: log_center_deinit (logpipe.h:254) ==17931== by 0x4E5F63F: main_loop_exit_finish (mainloop.c:585) ==17931== by 0x4E791A0: iv_run_timers (iv_timer.c:345) ==17931== by 0x4E777A5: iv_main (iv_main.c:252) ==17931== by 0x4E6023D: main_loop_run (mainloop.c:731) ==17931== by 0x40175D: main (main.c:260) I hope this can help you diagnose and fix this issue. HTH -- Jakub Jankowski|shasta@toxcorp.com|http://toxcorp.com/ GPG: FCBF F03D 9ADB B768 8B92 BB52 0341 9037 A875 942D
Just when I hit the Send button, I realized... :) On Sunday 25 of March 2012, Jakub Jankowski wrote:
Apart from the leak you can see in valgrind logfile, there is another symptom - some destination files are created empty:
Please disregard that part about empty files - this was caused by too low fd limit. I have re-uploaded valgrind log file without the fd-limits noise. Memleak is still present, of course. :) Sorry :) -- Jakub Jankowski|shasta@toxcorp.com|http://toxcorp.com/ GPG: FCBF F03D 9ADB B768 8B92 BB52 0341 9037 A875 942D
Jakub Jankowski <shasta@toxcorp.com> writes:
On Thursday 22 of March 2012, Peter Czanik wrote:
On 03/22/2012 02:46 PM, Jakub Jankowski wrote:
I know at least about one memory leak that has been once fixed, but then reintroduced between 3.3.2 and 3.3.3, and I think is present since then. I've mentioned it on this list in February: https://lists.balabit.hu/pipermail/syslog-ng/2012-February/018334.html
It would be really nice if developers could take a look at the code again. :)
Check this: https://bugzilla.balabit.com/show_bug.cgi?id=160 The mentioned patch is available at http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commit;h=3c14a264ab4c76b9...
Unfortunately, this doesn't fix the memleak I'm seeing. I've just recreated my testbed and tried current 3.3 HEAD (using Gergely's tarball as source: http://packages.madhouse-project.org/syslog-ng/3.3/3.3.4/syslog-ng-3.3.4-201...) and can confirm "my" leak is still happening.
Thanks for the detailed report, I was able to reproduce the problem (at least the leak - I didn't get any empty files), and am now looking into it. -- |8]
Jakub Jankowski <shasta@toxcorp.com> writes:
I'm pretty sure this particular memory leak was fixed in 3.3.2, but then got reintroduced in this commit: http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commitdiff;h=c7070e2a6f1c... which was done to fix tcp() destination related crash after 3.3.2 was released.
So, Bazsi, yes - it does reintroduce the leak :) (in reply to your comment in https://lists.balabit.hu/pipermail/syslog-ng/2011-November/017697.html )
I think I found the solution: the reason for the crash in afsocket, which Bazsi fixed with killing the log_queue_unref in lib/driver.h was that it was missing a log_queue_ref. We didn't unref more than we should, we reffed less! While I don't entirely understand the call chain, comparing how affile and afsocket did their queue magic, I prepared a patch that does not crash afsocket on reload, and stops the leak aswell. It also looks sane-ish to me, so I'm reasonably sure it's fine. The patch is attached, I tested it as much as I could: no crash, no leak, no empty files. If you could test it too, it would be most appreciated! -- |8]
On Friday 30 of March 2012, Gergely Nagy wrote:
Jakub Jankowski <shasta@toxcorp.com> writes:
I'm pretty sure this particular memory leak was fixed in 3.3.2, but then got reintroduced in this commit: http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commitdiff;h=c7070e2a6 f1c3a312260bcecf49d62028fef27ce which was done to fix tcp() destination related crash after 3.3.2 was released.
So, Bazsi, yes - it does reintroduce the leak :) (in reply to your comment in https://lists.balabit.hu/pipermail/syslog-ng/2011-November/017697.html )
I think I found the solution: the reason for the crash in afsocket, which Bazsi fixed with killing the log_queue_unref in lib/driver.h was that it was missing a log_queue_ref. We didn't unref more than we should, we reffed less!
While I don't entirely understand the call chain, comparing how affile and afsocket did their queue magic, I prepared a patch that does not crash afsocket on reload, and stops the leak aswell. It also looks sane-ish to me, so I'm reasonably sure it's fine.
The patch is attached, I tested it as much as I could: no crash, no leak, no empty files. If you could test it too, it would be most appreciated!
Thanks! With this patch I can no longer trigger this memory leak, indeed. I've forwarded it to my ex-coworkers who suffer from this on production servers (I'm not there anymore), so I hope to hear from them if it makes the leak go away there as well. I also hope others will test it too (especially Patrick H., who reported the tcp() crash). Ack from me on the memleak fix, though. As for the missing files - don't bother. It was a PEBKAC, and a completely unrelated issue. I needed to setup a testbed from scratch for this, and forgot to raise maximum fd limit before testing. So now fingers crossed for this patch to make it's way into 3.3 tree :) Thanks again, Gergely! Regards, -- Jakub Jankowski|shasta@toxcorp.com|http://toxcorp.com/ GPG: FCBF F03D 9ADB B768 8B92 BB52 0341 9037 A875 942D
On Fri, 2012-03-30 at 16:12 +0200, Gergely Nagy wrote:
Jakub Jankowski <shasta@toxcorp.com> writes:
I'm pretty sure this particular memory leak was fixed in 3.3.2, but then got reintroduced in this commit: http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commitdiff;h=c7070e2a6f1c... which was done to fix tcp() destination related crash after 3.3.2 was released.
So, Bazsi, yes - it does reintroduce the leak :) (in reply to your comment in https://lists.balabit.hu/pipermail/syslog-ng/2011-November/017697.html )
I think I found the solution: the reason for the crash in afsocket, which Bazsi fixed with killing the log_queue_unref in lib/driver.h was that it was missing a log_queue_ref. We didn't unref more than we should, we reffed less!
While I don't entirely understand the call chain, comparing how affile and afsocket did their queue magic, I prepared a patch that does not crash afsocket on reload, and stops the leak aswell. It also looks sane-ish to me, so I'm reasonably sure it's fine.
The patch is attached, I tested it as much as I could: no crash, no leak, no empty files. If you could test it too, it would be most appreciated!
differences between files attachment (0001-afsocket-Properly-release-our-queue-at-deinit.patch) From f15f8d91a28680092ba2ab0bd3ed64220d0ceba8 Mon Sep 17 00:00:00 2001 From: Gergely Nagy <algernon@balabit.hu> Date: Fri, 30 Mar 2012 16:00:17 +0200 Subject: [PATCH] afsocket: Properly release our queue at deinit
At deinit time, release the destination driver's queue explicitly. This allows us to re-add the implicit queue unref to log_dest_driver_release_queue().
This fixes the memory leak seen with, for example, file destinations.
Signed-off-by: Gergely Nagy <algernon@balabit.hu> --- lib/driver.h | 1 + modules/afsocket/afsocket.c | 3 +++ 2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/lib/driver.h b/lib/driver.h index 7f7acbd..c805da5 100644 --- a/lib/driver.h +++ b/lib/driver.h @@ -188,6 +188,7 @@ log_dest_driver_release_queue(LogDestDriver *self, LogQueue *q) if (q) { self->queues = g_list_remove(self->queues, q); + log_queue_unref(q);
self->release_queue(self, q, self->release_queue_data); } diff --git a/modules/afsocket/afsocket.c b/modules/afsocket/afsocket.c index ae9c5c2..4e9002f 100644 --- a/modules/afsocket/afsocket.c +++ b/modules/afsocket/afsocket.c @@ -1217,6 +1217,9 @@ afsocket_dd_deinit(LogPipe *s) if (self->writer) log_pipe_deinit(self->writer);
+ log_dest_driver_release_queue(&self->super, log_writer_get_queue(self->writer));
this shouldn't be needed, as this is called automatically by log_dest_driver_deinit_method(), later in this function.
+ log_writer_set_queue(self->writer, NULL); +
This shouldn't be needed here.
if (self->flags & AFSOCKET_KEEP_ALIVE) { cfg_persist_config_add(cfg, afsocket_dd_format_persist_name(self, FALSE), self->writer, (GDestroyNotify) log_pipe_unref, FALSE);
You said, the issue was that we didn't put enough refs on the queue, that was the reason for the original crash. Can you describe where? Thanks. -- Bazsi
Hi, I think this patch solves both problems and addresses the original flaw in the LogDestDriver queue ref counting. Instead of working around it in afsocket. The code in affile is needed to avoid keeping the LogQueue instances of reaped files around, the same is not needed for the afsocket driver, which only has a single queue. commit 9064e909e8aef518ec3c073bccc1bf09da9a2c06 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Sun Apr 1 09:42:58 2012 +0200 driver: fixed possible leak and use-after-free log_dest_driver_release_queue() was possibly leaking the LogQueue instances for file destinations when they got reaped. This was caused by an earlier patch that fixed a crash in reloads, more specifically this one: c7070e2a6f1c3a312260bcecf49d62028fef27ce This patch should fix both cases properly, the leak in the file destination driver and the original crash in the afsocket destination. Also, this patch fixes a use-after-free condition, the next member of a GList structure was referenced after it was removed from the list. Kudos to Jakub for the detailed bug report and Algernon for the origianl fix. Reported-By: Jakub Jankowski <shasta@toxcorp.com> Signed-off-by: Gergely Nagy <algernon@balabit.hu> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> On Sat, 2012-03-31 at 22:42 +0200, Balazs Scheidler wrote:
On Fri, 2012-03-30 at 16:12 +0200, Gergely Nagy wrote:
Jakub Jankowski <shasta@toxcorp.com> writes:
I'm pretty sure this particular memory leak was fixed in 3.3.2, but then got reintroduced in this commit: http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commitdiff;h=c7070e2a6f1c... which was done to fix tcp() destination related crash after 3.3.2 was released.
So, Bazsi, yes - it does reintroduce the leak :) (in reply to your comment in https://lists.balabit.hu/pipermail/syslog-ng/2011-November/017697.html )
I think I found the solution: the reason for the crash in afsocket, which Bazsi fixed with killing the log_queue_unref in lib/driver.h was that it was missing a log_queue_ref. We didn't unref more than we should, we reffed less!
While I don't entirely understand the call chain, comparing how affile and afsocket did their queue magic, I prepared a patch that does not crash afsocket on reload, and stops the leak aswell. It also looks sane-ish to me, so I'm reasonably sure it's fine.
The patch is attached, I tested it as much as I could: no crash, no leak, no empty files. If you could test it too, it would be most appreciated!
differences between files attachment (0001-afsocket-Properly-release-our-queue-at-deinit.patch) From f15f8d91a28680092ba2ab0bd3ed64220d0ceba8 Mon Sep 17 00:00:00 2001 From: Gergely Nagy <algernon@balabit.hu> Date: Fri, 30 Mar 2012 16:00:17 +0200 Subject: [PATCH] afsocket: Properly release our queue at deinit
At deinit time, release the destination driver's queue explicitly. This allows us to re-add the implicit queue unref to log_dest_driver_release_queue().
This fixes the memory leak seen with, for example, file destinations.
Signed-off-by: Gergely Nagy <algernon@balabit.hu> --- lib/driver.h | 1 + modules/afsocket/afsocket.c | 3 +++ 2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/lib/driver.h b/lib/driver.h index 7f7acbd..c805da5 100644 --- a/lib/driver.h +++ b/lib/driver.h @@ -188,6 +188,7 @@ log_dest_driver_release_queue(LogDestDriver *self, LogQueue *q) if (q) { self->queues = g_list_remove(self->queues, q); + log_queue_unref(q);
self->release_queue(self, q, self->release_queue_data); } diff --git a/modules/afsocket/afsocket.c b/modules/afsocket/afsocket.c index ae9c5c2..4e9002f 100644 --- a/modules/afsocket/afsocket.c +++ b/modules/afsocket/afsocket.c @@ -1217,6 +1217,9 @@ afsocket_dd_deinit(LogPipe *s) if (self->writer) log_pipe_deinit(self->writer);
+ log_dest_driver_release_queue(&self->super, log_writer_get_queue(self->writer));
this shouldn't be needed, as this is called automatically by log_dest_driver_deinit_method(), later in this function.
+ log_writer_set_queue(self->writer, NULL); +
This shouldn't be needed here.
if (self->flags & AFSOCKET_KEEP_ALIVE) { cfg_persist_config_add(cfg, afsocket_dd_format_persist_name(self, FALSE), self->writer, (GDestroyNotify) log_pipe_unref, FALSE);
You said, the issue was that we didn't put enough refs on the queue, that was the reason for the original crash. Can you describe where?
Thanks.
-- Bazsi
participants (5)
-
Balazs Scheidler
-
Gergely Nagy
-
Jakub Jankowski
-
Marvin Nipper
-
Peter Czanik