[syslog-ng] Memory leak reintroduced in 3.3.3. (was: Re: Known memory leak issues in 3.3.4?)

25 Mar 2012

      On Thursday 22 of March 2012, Peter Czanik wrote:
...
On 03/22/2012 02:46 PM, Jakub Jankowski wrote:
...
I know at least about one memory leak that has been once fixed, but then
reintroduced between 3.3.2 and 3.3.3, and I think is present since then.
I've mentioned it on this list in February:
https://lists.balabit.hu/pipermail/syslog-ng/2012-February/018334.html
It would be really nice if developers could take a look at the code
again. :)
Check this: https://bugzilla.balabit.com/show_bug.cgi?id=160 The
mentioned patch is available at
http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commit;h=3c14a264ab4c76b9...
Unfortunately, this doesn't fix the memleak I'm seeing. I've just recreated
my testbed and tried current 3.3 HEAD (using Gergely's tarball as source:
http://packages.madhouse-project.org/syslog-ng/3.3/3.3.4/syslog-ng-3.3.4-201...)
and can confirm "my" leak is still happening.

Below are steps to reproduce it:

Client is a loggen, started as:
$ loggen --read-file access.random.10000.munged --active-connections 1 --interval 600 --syslog-proto <SERVERIP> 514

where access.random.10000.munged is a 10000-line file containing munged fragments 
of Apache accesslog:
$ head access.random.10000.munged
Sep 15 23:59:56 www142 accesslog: exyrqpxwva a.b.c.d - - [15/Sep/2011:23:59:56 +0200]
Sep 15 23:59:57 www142 accesslog: mnufil a.b.c.d - - [15/Sep/2011:23:59:57 +0200]
Sep 15 23:59:57 www142 accesslog: bvbgdknlvxibais a.b.c.d - - [15/Sep/2011:23:59:57 +0200]
Sep 15 23:59:57 www83 accesslog: ttmpmlgf a.b.c.d - - [15/Sep/2011:23:59:57 +0200]
Sep 15 23:59:57 www146 accesslog: igncmpwudj a.b.c.d - - [15/Sep/2011:23:59:57 +0200]
Sep 15 23:59:57 www146 accesslog: pjecagrmuywulbro a.b.c.d - - [15/Sep/2011:23:59:57 +0200]
Sep 15 23:59:57 www147 accesslog: ghjwjejsetvq a.b.c.d - - [15/Sep/2011:23:59:57 +0200]
Sep 15 23:59:57 www22 accesslog: zcnalgemoowpunhtj a.b.c.d - - [15/Sep/2011:23:59:57 +0200]
Sep 15 23:59:57 www146 accesslog: houclrfiukhbfofueill a.b.c.d - - [15/Sep/2011:23:59:57 +0200]
Sep 15 23:59:57 www146 accesslog: kwqplgsulzkryhdnr a.b.c.d - - [15/Sep/2011:23:59:57 +0200]
$

The most important thing are the random strings (which in real scenario are vhosts names).
Destination filenames on server are based upon those strings.

Server is configured like this:

$ cat /etc/syslog-ng/syslog-ng.conf
@version: 3.3
options {
        time_reap (3);
        log_fifo_size (100000);
        use_dns (no);
        use_fqdn (no);
        create_dirs (yes);
        keep_hostname (yes);
        log_msg_size(16384);
        # doesnt matter, with threaded(yes) it fails aswell
        threaded (no);

        owner ("root");
        group ("root");
        perm (0644);
        dir_owner ("root");
        dir_group ("root");
        dir_perm (0755);
};
source s_web {
    syslog(ip(0.0.0.0) port(514) flags(no-multi-line) log_iw_size(6000) max-connections(30));
};
parser p_apache_split {
    csv-parser(
        columns("APACHE.ACCESS.VHOST", "APACHE.ACCESS.LOG_MESSAGE")
        flags(greedy)
        delimiters(" ")
    );
};
destination d_test { 
    file("/logs/${APACHE.ACCESS.VHOST}-${R_YEAR}${R_MONTH}${R_DAY}" 
        template("${APACHE.ACCESS.LOG_MESSAGE}\n"));
};
log { source(s_web); 
    parser(p_apache_split); 
    destination(d_test);
    flags(final);
};
$

Running it under valgrind like this:
# G_SLICE=always-malloc valgrind --log-file=s3.3.4-fd3b-1.log \
  --leak-check=full -v --show-reachable=yes --track-origins=yes  \
  syslog-ng -F --no-caps -p /var/run/syslog-ng.pid

produces following log file: http://toxcorp.com/stuff/syslog-ng-leak/s3.3.4-fd3b-1.log

I'm pretty sure this particular memory leak was fixed in 3.3.2,
but then got reintroduced in this commit:
http://git.balabit.hu/?p=bazsi/syslog-ng-3.3.git;a=commitdiff;h=c7070e2a6f1c...
which was done to fix tcp() destination related crash after 3.3.2 was released.

So, Bazsi, yes - it does reintroduce the leak :) (in reply to your comment in
https://lists.balabit.hu/pipermail/syslog-ng/2011-November/017697.html )

Apart from the leak you can see in valgrind logfile, there is another 
symptom - some destination files are created empty:
$ find /logs/ -type f | wc -l
10000
$ find /logs/ -type f -size 0 | wc -l
4410
$ find /logs/ -type f -size 0 | head -n 1
/logs/hcxjscbcnyoktilm-20120325

Even though there is a perfectly valid source record for them:
$ grep hcxjscbcnyoktilm access.random.10000.munged 
Sep 16 00:00:05 www147 accesslog: hcxjscbcnyoktilm a.b.c.d - - [16/Sep/2011:00:00:05 +0200]
$

The number of empty files corresponds to the number of valgrind
reported errors:

==17931== 4410 errors in context 1 of 6:
==17931== Invalid read of size 8
==17931==    at 0x4E4BA05: log_dest_driver_deinit_method (driver.c:172)
==17931==    by 0x6CFE559: affile_dd_deinit (affile.c:1001)
==17931==    by 0x4E4AD33: log_dest_group_deinit (logpipe.h:254)
==17931==    by 0x4E4650A: log_center_deinit (logpipe.h:254)
==17931==    by 0x4E5F63F: main_loop_exit_finish (mainloop.c:585)
==17931==    by 0x4E791A0: iv_run_timers (iv_timer.c:345)
==17931==    by 0x4E777A5: iv_main (iv_main.c:252)
==17931==    by 0x4E6023D: main_loop_run (mainloop.c:731)
==17931==    by 0x40175D: main (main.c:260)

I hope this can help you diagnose and fix this issue.

HTH

-- 
Jakub Jankowski|shasta@toxcorp.com|http://toxcorp.com/
GPG: FCBF F03D 9ADB B768 8B92 BB52 0341 9037 A875 942D