[syslog-ng] Fwd: Memleak in syslog-ng v3.19.1

Balazs Scheidler bazsi77 at gmail.com
Tue Jun 9 13:50:09 UTC 2020


Memory is allocated from the system in large chunks (by the libc
allocator), which is then allocated by syslog-ng in smaller blocks. Some of
these small blocks may still be in use in the large block which prevents
the libc allocator to return it to the system. Also, returning these large
chunks is done lazily and that is outside of the context of syslog-ng.

There are alternative allocators (jemalloc for instance but there might be
others), that exhibit this behavior to a lesser extent.

Also, like I mentioned, using multiple threads increases memory usage in a
number of ways, so disable them completely, that can save you a few megs.



On Tue, Jun 9, 2020, 13:09 Vidya Balakrishnan <vidyabalakrishnan96 at gmail.com>
wrote:

> Greetings,
>
> Thanks for the quick response.
>
> I quickly looked into the output of smaps before and after memory increase.
> There are actually two scenarios were we see an increase in the memory
> consumption as seen from the PS command as below
>
> *#ps | grep syslog*
>
> * PID    USER       VSZ    STAT  COMMAND*
> *23760  admin      26700    S           /sbin/syslog-ng
> --cfgfile=/var/syslog-ng.conf*
>
> *where VSZ -> virtual size*
>
> *Scenario 1:*
> We run syslog-ng in a home router which pushes the logs to a Graylog
> server periodically. Whenever there is a WAN connectivity issue in the
> router, it generates a log file which contains almost upto 4000 lines of
> logs. When the connectivity is up, these logs are pushed at once to the
> queue of syslog-ng as seen from the output of syslog-ng-ctl stats and then
> written to Graylog server. So when bulk log message containing upto 3000
> lines are pushed at once, we see an increase in memory from 26MB to 28MB
> and this value remains at 28MB even after the logs are pushed to the server
> successfully.
>
> *Output of syslog-ng-ctl stats before WAN connectivity is lost in the
> router:*
>
> dst.network;d_Log_Server#0;tcp,abcd.com:1514;a;dropped;0
>
> dst.network;d_Log_Server#0;tcp,abcd.com:1514;a;processed*;613*
>
> dst.network;d_Log_Server#0;tcp,abcd.com:1514;a;queued;*0*
>
> dst.network;d_Log_Server#0;tcp,abcd.com:1514;a;written;*613*
>
>
> *After WAN connection is restored to the router, WAN DOWN logs get written
> to the queue and queue size increases also memory consumption of syslog-ng
> increases by 2MB:*
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;dropped;0
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;processed;4538
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;queued;*3877*
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;written;661
>
>
> *After logs are written from queue to server, queue size comes back to 0,
> but the memory consumption which increased to 28MB does not come down to
> 26MB(which was the initial value at launch) :*
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;dropped;0
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;processed;5250
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;queued;*0*
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;written;5250
>
>
> *Scenario 2:  *
>
> When syslog-ng is continuously in use for about 3 days, we see an increase
> in memory from 26MB to 37MB all at once (please note that this is not a
> gradual increase, but an increase that occurs all of a sudden).
>
> During this scenario, the server is reachable and the processed and
> written message count is equal and queue size is 0.
>
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;dropped;0
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;processed;5000
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;queued;*0*
>
> dst.network;d_Log_Server#0;tcp, abcd.com :1514;a;written;5000
>
>
> *Queries:*
>
> 1.From scenario 1, memory usage increases by 2MB (once in 5 tries) when
> 3000 log messages are dumped to the queue at once. Is this a potential
> memory leak ? Or is this behaviour expected ?
>
> 2.From scenario 2 even when the setup has proper WAN connectivity( no
> bulk logs are pushed at once to the queue)  there is an increase seen in
> memory usage of syslog-ng (from the output of PS command that has been
> pasted above). Is it because of memory leak in the code of syslog-ng or is
> this behaviour expected on running syslog-ng continuously for a few days?
>
>
> Could you please help resolve my confusion regarding the above two
> scenarios that have been described.
>
>
> Regards,
>
> Vidya Balakrishnan
>
> On Tue, Jun 9, 2020 at 2:40 PM Balazs Scheidler <bazsi77 at gmail.com> wrote:
>
>> Hi,
>>
>> Unfortunately I am not sure this is a leak from what you have posted. The
>> issues valgrind reported were one-off allocations caused by plugins being
>> loaded into syslog-ng.
>>
>> The numbers (26, 35 and even 45mb) is not conclusive either. Syslog-ng
>> uses internal data structures that are dynamically growing up to a point
>> and then stabilize. Also, syslog-ng uses threads and per thread instances
>> of these data structures, not to mention the libc's heap allocator that
>> also tends to use per thread areas. Similarly stacks used by threads also
>> get an allocation.
>>
>> You can get a glimpse of chunks being allocated to the process by
>> checking /proc/PID/maps, and syslog-ng also has some stats counters that
>> reflect its internal memory usage. (Scratch-buffers, etc).
>>
>> You can try to disable multi-threaded operation of syslog-ng and see if
>> that decreases usage.
>>
>> On Tue, Jun 9, 2020, 09:31 Vidya Balakrishnan <
>> vidyabalakrishnan96 at gmail.com> wrote:
>>
>>> Greetings,
>>>
>>> This mail is regarding Memory leak in Syslog-ng v3.19.1. The process
>>> gets launched at 26MB and then grows to 37MB or even upto 45MB.
>>>
>>> Pasting the configurations used below:
>>> @version: 3.19
>>> @include "scl.conf"
>>> source s_local {
>>>     #system();
>>>     #internal();
>>>     syslog(port(514) transport("udp"));
>>> };
>>> rewrite tag_SN {
>>>     set("ABC-XXXXXXX", value("HOST"));
>>> };
>>>
>>> destination d_Log_Server {
>>>     network("abcd.com"
>>>             port(514)
>>>             transport(tls)
>>>             tls( ca-dir("/etc/ssl") peer-verify("required-trusted") )
>>>             time-zone("+00:00")
>>>
>>> template("${R_YEAR}-${R_MONTH}-${R_DAY}T${R_HOUR}:${R_MIN}:${R_SEC}.${R_MSEC}${TZOFFSET}
>>> ${HOST} ${MSGHDR}${MESSAGE}\n")
>>>             );
>>> };
>>> log {
>>>     source(s_local);
>>>     rewrite(tag_SN);
>>>     destination(d_Log_Server);
>>> };
>>>
>>> Also, tried debugging using valgrind but did not resolve the issue.
>>>
>>> ==19671== Memcheck, a memory error detector
>>>
>>> ==19671== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
>>>
>>> ==19671== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright
>>> info
>>>
>>> ==19671== Command: /sbin/syslog-ng --cfgfile=/var/syslog-ng.conf
>>>
>>> ==19671== Parent PID: 2190
>>>
>>> ==19671==
>>>
>>> ==19671== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19671==    at 0x401A8D4: index (in /lib/ld-linux.so.3)
>>>
>>> ==19671==    by 0x4007F33: expand_dynamic_string_token (in
>>> /lib/ld-linux.so.3)
>>>
>>> ==19671==
>>>
>>> ==19671== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19671==    at 0x401A8D8: index (in /lib/ld-linux.so.3)
>>>
>>> ==19671==    by 0x4007F33: expand_dynamic_string_token (in
>>> /lib/ld-linux.so.3)
>>>
>>> ==19671==
>>>
>>> ==19671== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19671==    at 0x4007F38: expand_dynamic_string_token (in
>>> /lib/ld-linux.so.3)
>>>
>>> ==19671==
>>>
>>> disInstr(arm): unhandled instruction: 0xEC510F1E
>>>
>>>                  cond=14(0xE) 27:20=197(0xC5) 4:4=1 3:0=14(0xE)
>>>
>>> ==19671== valgrind: Unrecognised instruction at address 0x4cd6e68.
>>>
>>> ==19671==    at 0x4CD6E68: _armv7_tick (in /lib/libcrypto.so.1.1)
>>>
>>> ==19671== Your program just tried to execute an instruction that Valgrind
>>>
>>> ==19671== did not recognise.  There are two possible reasons for this.
>>>
>>> ==19671== 1. Your program has a bug and erroneously jumped to a non-code
>>>
>>> ==19671==    location.  If you are running Memcheck and you just saw a
>>>
>>> ==19671==    warning about a bad jump, it's probably your program's
>>> fault.
>>>
>>> ==19671== 2. The instruction is legitimate but Valgrind doesn't handle
>>> it,
>>>
>>> ==19671==    i.e. it's Valgrind's fault.  If you think this is the case
>>> or
>>>
>>> ==19671==    you are not sure, please let us know and we'll try to fix
>>> it.
>>>
>>> ==19671== Either way, Valgrind will now raise a SIGILL signal which will
>>>
>>> ==19671== probably kill your program.
>>>
>>> ==19700== Invalid read of size 4
>>>
>>> ==19700==    at 0x401A8D4: index (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x40145BB: dl_open_worker (in /lib/ld-linux.so.3)
>>>
>>> ==19700==  Address 0x4f86bb4 is 3 bytes after a block of size 41 alloc'd
>>>
>>> ==19700==    at 0x4847654: malloc (vg_replace_malloc.c:299)
>>>
>>> ==19700==    by 0x49B3BC3: g_malloc (gmem.c:94)
>>>
>>> ==19700==    by 0x49CEA17: g_strdup (gstrfuncs.c:363)
>>>
>>> ==19700==    by 0x49417BF: g_module_open (gmodule.c:543)
>>>
>>> ==19700==    by 0x489AFDB: plugin_dlopen_module_as_filename
>>> (plugin.c:252)
>>>
>>> ==19700==    by 0x489B8B3: plugin_dlopen_module_as_dir_and_filename
>>> (plugin.c:271)
>>>
>>> ==19700==    by 0x489B8B3: plugin_load_candidate_modules (plugin.c:457)
>>>
>>> ==19700==    by 0x4887017: cfg_lexer_preprocess (cfg-lexer.c:1068)
>>>
>>> ==19700==    by 0x4887017: cfg_lexer_lex (cfg-lexer.c:1110)
>>>
>>> ==19700==    by 0x48A876F: main_parse (cfg-grammar.c:3143)
>>>
>>> ==19700==    by 0x4887D8F: cfg_parser_parse (cfg-parser.c:369)
>>>
>>> ==19700==    by 0x48845BF: cfg_run_parser (cfg.c:504)
>>>
>>> ==19700==    by 0x48846B3: cfg_read_config (cfg.c:556)
>>>
>>> ==19700==    by 0x489766B: main_loop_read_and_init_config
>>> (mainloop.c:552)
>>>
>>> ==19700==
>>>
>>> ==19700== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19700==    at 0x401A8D4: index (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x40145BB: dl_open_worker (in /lib/ld-linux.so.3)
>>>
>>> ==19700==
>>>
>>> ==19700== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19700==    at 0x401A8D8: index (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x40145BB: dl_open_worker (in /lib/ld-linux.so.3)
>>>
>>> ==19700==
>>>
>>> ==19700== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19700==    at 0x40145C0: dl_open_worker (in /lib/ld-linux.so.3)
>>>
>>> ==19700==
>>>
>>> ==19700== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19700==    at 0x401AD9C: strcpy (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x400A8D3: _dl_lookup_symbol_x (in /lib/ld-linux.so.3)
>>>
>>> ==19700==
>>>
>>> ==19700== Invalid read of size 4
>>>
>>> ==19700==    at 0x401AC60: strcpy (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x400A8D3: _dl_lookup_symbol_x (in /lib/ld-linux.so.3)
>>>
>>> ==19700==  Address 0x5185254 is 0 bytes after a block of size 20 alloc'd
>>>
>>> ==19700==    at 0x4849EB4: realloc (vg_replace_malloc.c:785)
>>>
>>> ==19700==    by 0x49B3C9B: g_realloc (gmem.c:159)
>>>
>>> ==19700==    by 0x49FCAF3: _g_gnulib_vasnprintf (vasnprintf.c:5533)
>>>
>>> ==19700==    by 0x49FD263: _g_gnulib_vasprintf (printf.c:142)
>>>
>>> ==19700==    by 0x49F6B0B: g_vasprintf (gprintf.c:310)
>>>
>>> ==19700==    by 0x49CEB7F: g_strdup_vprintf (gstrfuncs.c:514)
>>>
>>> ==19700==    by 0x49CEBAB: g_strdup_printf (gstrfuncs.c:540)
>>>
>>> ==19700==    by 0x489AF7F: plugin_get_module_init_name (plugin.c:234)
>>>
>>> ==19700==    by 0x489B363: plugin_load_module (plugin.c:364)
>>>
>>> ==19700==    by 0x48ABCAB: pragma_parse (pragma-grammar.y:443)
>>>
>>> ==19700==    by 0x4887D8F: cfg_parser_parse (cfg-parser.c:369)
>>>
>>> ==19700==    by 0x4886E9B: cfg_lexer_parse_pragma (cfg-lexer.c:996)
>>>
>>> ==19700==    by 0x4886E9B: cfg_lexer_preprocess (cfg-lexer.c:1040)
>>>
>>> ==19700==    by 0x4886E9B: cfg_lexer_lex (cfg-lexer.c:1110)
>>>
>>> ==19700==
>>>
>>> ==19700== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19700==    at 0x401AD80: strcpy (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x400A8D3: _dl_lookup_symbol_x (in /lib/ld-linux.so.3)
>>>
>>> ==19700==
>>>
>>> ==19700== Use of uninitialised value of size 4
>>>
>>> ==19700==    at 0x401ACB8: strcpy (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x400A8D3: _dl_lookup_symbol_x (in /lib/ld-linux.so.3)
>>>
>>> ==19700==
>>>
>>> ==19700== Invalid read of size 4
>>>
>>> ==19700==    at 0x401ADAC: strcpy (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x400A8D3: _dl_lookup_symbol_x (in /lib/ld-linux.so.3)
>>>
>>> ==19700==  Address 0x533bdac is 3 bytes after a block of size 25 alloc'd
>>>
>>> ==19700==    at 0x4849EB4: realloc (vg_replace_malloc.c:785)
>>>
>>> ==19700==    by 0x49B3C9B: g_realloc (gmem.c:159)
>>>
>>> ==19700==    by 0x49FCAF3: _g_gnulib_vasnprintf (vasnprintf.c:5533)
>>>
>>> ==19700==    by 0x49FD263: _g_gnulib_vasprintf (printf.c:142)
>>>
>>> ==19700==    by 0x49F6B0B: g_vasprintf (gprintf.c:310)
>>>
>>> ==19700==    by 0x49CEB7F: g_strdup_vprintf (gstrfuncs.c:514)
>>>
>>> ==19700==    by 0x49CEBAB: g_strdup_printf (gstrfuncs.c:540)
>>>
>>> ==19700==    by 0x489AF7F: plugin_get_module_init_name (plugin.c:234)
>>>
>>> ==19700==    by 0x489B363: plugin_load_module (plugin.c:364)
>>>
>>> ==19700==    by 0x489B6BB: plugin_find (plugin.c:199)
>>>
>>> ==19700==    by 0x489921B: msg_format_options_init (msg-format.c:81)
>>>
>>> ==19700==    by 0x4891A9B: log_reader_options_init (logreader.c:728)
>>>
>>> ==19700==
>>>
>>> ==19700== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19700==    at 0x401ADC4: strcpy (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x400A8D3: _dl_lookup_symbol_x (in /lib/ld-linux.so.3)
>>>
>>> ==19700==
>>>
>>> ==19700== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19700==    at 0x401A8D4: index (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x4008BFB: _dl_map_object (in /lib/ld-linux.so.3)
>>>
>>> ==19700==
>>>
>>> ==19700== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19700==    at 0x401A8D8: index (in /lib/ld-linux.so.3)
>>>
>>> ==19700==    by 0x4008BFB: _dl_map_object (in /lib/ld-linux.so.3)
>>>
>>> ==19700==
>>>
>>> ==19700== Conditional jump or move depends on uninitialised value(s)
>>>
>>> ==19700==    at 0x4008C00: _dl_map_object (in /lib/ld-linux.so.3)
>>>
>>> ==19700==
>>>
>>> ==19700== Syscall param epoll_pwait(sigmask) points to unaddressable
>>> byte(s)
>>>
>>> ==19700==    at 0x4BCC73C: epoll_pwait (in /lib/libc.so.6)
>>>
>>> ==19700==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
>>>
>>> ==19700==
>>>
>>> ==19671==
>>>
>>> ==19671== HEAP SUMMARY:
>>>
>>> ==19671==     in use at exit: 11,980 bytes in 55 blocks
>>>
>>> ==19671==   total heap usage: 179 allocs, 124 frees, 49,157 bytes
>>> allocated
>>>
>>> ==19671==
>>>
>>> ==19671== 419 (44 direct, 375 indirect) bytes in 1 blocks are definitely
>>> lost in loss record 44 of 46
>>>
>>> ==19671==    at 0x4847654: malloc (vg_replace_malloc.c:299)
>>>
>>> ==19671==    by 0x49B3BC3: g_malloc (gmem.c:94)
>>>
>>> ==19671==    by 0x488BD07: g_process_set_argv_space (gprocess.c:569)
>>>
>>> ==19671==    by 0x1127B: main (main.c:216)
>>>
>>> ==19671==
>>>
>>> ==19671== LEAK SUMMARY:
>>>
>>> ==19671==    definitely lost: 44 bytes in 1 blocks
>>>
>>> ==19671==    indirectly lost: 375 bytes in 10 blocks
>>>
>>> ==19671==      possibly lost: 0 bytes in 0 blocks
>>>
>>> ==19671==    still reachable: 11,561 bytes in 44 blocks
>>>
>>> ==19671==         suppressed: 0 bytes in 0 blocks
>>>
>>> ==19671== Reachable blocks (those to which a pointer was found) are not
>>> shown.
>>>
>>> ==19671== To see them, rerun with: --leak-check=full
>>> --show-leak-kinds=all
>>>
>>>
>>>
>>> The memory usage keeps increasing. There are no local disks used to
>>> store the logs messages. It is directly sent to the Graylog server.
>>>
>>> Could you please take a look at this issue and help resolving the same.
>>>
>>>
>>> Regards,
>>>
>>> Vidya Balakrishnan
>>>
>>> ______________________________________________________________________________
>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>> Documentation:
>>> http://www.balabit.com/support/documentation/?product=syslog-ng
>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>
>>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation:
>> http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>
>>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20200609/06ef2fd9/attachment-0001.html>


More information about the syslog-ng mailing list