Re: [syslog-ng] Reload does not free up RAM

7 Oct 2008

      Evan Rempel wrote:
...
Balazs Scheidler wrote:
...
On Mon, 2008-10-06 at 11:38 -0700, Evan Rempel wrote:
...
Balazs Scheidler wrote:
...
On Sun, 2008-10-05 at 13:49 -0700, Evan Rempel wrote:
...
Balazs Scheidler wrote:
...
>>>>> We have cases where a syslog-ng buffers quite a bit of data (2GB or so) before
>>>>> it is able to flush the data to the destination. Currently, the only way to "free up" that RAM
>>>>> is to restart syslog-ng, which means that some log messages are lost during the restart
>>>>> window.
Both PE and OSE frees up disk/memory when possible. If they don't, then
that's probably a leak.
No, it isn't a memory leak as the process can be months old and only grows when
it buffers messages.
...
...
...
So if your syslog-ng process is 2GB in size, and you didn't specify an
overly large log_fifo_size(), it's probably because of a leak.
I have a large log_fifo_size() but the process starts small, grows when it needs
to buffer, and does NOT get small again when the destination is available.
Hmm... that might be related to the malloc implementation. Which
platform are you using?
...
...
> 2. The buffer space should be thrown away only after the message
>     has been delivered to the destination.
I see, this is what should happen now.
This does not appear to be happening.
I am using OSE 2.0.8 Have there been memory fixes/changes since then?
Not that I know of.
...
Is there a way to get buffer count statistics out? How many messages are
in each fifo buffer?
In the experimental 3.0 branch, yes there is. In current 2.0, I'm afraid
there's none.
Is there a limit on the growth of syslog-ng? I mean if it gets large
once and then empties its buffers, it stays large, but does it start
growing again, or it stays steadily at the size it was before?
I have not specifically tested this scenario, but we do track processes
and memory usage, and it appears that once the space is allocated, it does
not grow unless a larger set of messages gets buffered.
...
Again, the platform matters a lot, how the local malloc implementation
takes care about fragmentation, whether it shrinks the process once
memory is freed.
This is running on Redhat AS 3 (2.4 kernel) and Redhat AS 4 (2.6 kernel)
On the Redhat 4 system the glibc packages installed are
glibc-common-2.3.4-2.36
glibc-devel-2.3.4-2.36
glibc-headers-2.3.4-2.36
glib2-2.4.7-1
glibc-kernheaders-2.4-9.1.100.EL
hmm.. this version should be all right, at least in my experience glibc
tries to release memory back to the system once the application freed it
using free().
Maybe it is some kind of memory fragmentation issue. How easy is this to
reproduce? Does it require months of running, or you have a simple
recipe to create it?
We are encountering this fairly easily in production, so I should be able to
verify a recipe. It will take me a couple of days to "get to it", but will report back.
I thought that this deserved some attention...

make a destination

destination erempel     { pipe("/var/log/syslog.pipes/erempel" log_fifo_size(1000000) perm(0666) ); };

and set up some logging to it. I directed only *.debug stuff to mine.

filter f_debug  { priority(debug); };

log { source(local); filter(f_debug); destination(erempel); flags(final); };

Start a "tail -f /var/log/syslog.pipes/erempel"

and then generate some logs
ls -1 /usr/bin | logger -p user.debug

and I see the stuff pop out from the "tail -f"

Cancel the tail process and product a whole bunch more logs.

ls -1 /usr/bin | logger -p user.debug
repeatedly while watching the size of the syslog process.

Tue Oct 7 08:56:09 PDT 2008 root 23265 1.7 0.1 4004 1396 ? Ss 08:53 0:02 /usr/local/sbin/syslog-ng
Tue Oct 7 08:56:14 PDT 2008 root 23265 1.7 0.1 4004 1412 ? Ss 08:53 0:02 /usr/local/sbin/syslog-ng
Tue Oct 7 08:56:19 PDT 2008 root 23265 1.7 0.1 4004 1412 ? Ss 08:53 0:02 /usr/local/sbin/syslog-ng
Tue Oct 7 08:56:24 PDT 2008 root 23265 2.5 1.2 15524 12840 ? Ss 08:53 0:04 /usr/local/sbin/syslog-ng
Tue Oct 7 08:56:29 PDT 2008 root 23265 3.6 3.1 35384 32668 ? Ss 08:53 0:06 /usr/local/sbin/syslog-ng
Tue Oct 7 08:56:34 PDT 2008 root 23265 4.7 5.1 55904 53264 ? Ss 08:53 0:08 /usr/local/sbin/syslog-ng
Tue Oct 7 08:56:40 PDT 2008 root 23265 5.7 7.0 75724 73100 ? Ss 08:53 0:10 /usr/local/sbin/syslog-ng
Tue Oct 7 08:56:45 PDT 2008 root 23265 5.6 7.1 76504 73864 ? Ss 08:53 0:10 /usr/local/sbin/syslog-ng
Tue Oct 7 08:56:50 PDT 2008 root 23265 6.8 7.1 76504 73864 ? Ss 08:53 0:13 /usr/local/sbin/syslog-ng
Tue Oct 7 08:56:55 PDT 2008 root 23265 6.6 7.1 76504 73864 ? Ss 08:53 0:13 /usr/local/sbin/syslog-ng

The intent is to start the "rail -f" again to read all of the records and see if the memory gets smaller,
but when I start the tail -f again, I get NOTHING. Syslog-ng seems to have closed the pipe destination
and is NOT reopening it. Fortunately, the author produced a HUP signal that will cause all outputs
to be closed and opened again. So, I do a HUP, and only the LAST 10 records are displayed. Where did all
of the other records go?

If I do
ls -1 /usr/bin | logger -p user.debug
I see the full list from the "tail -f"

Back to memory. The memory never gets smaller.

Now I am beginning to thing that the records got orphaned inside of syslog-ng such that they never
made it to the destination, and as such are never candidates for memory reclaiming.

I think I have stumbled on a bug for record loss, and I would like to state again that ALL destinations
should follow the time_reopen() setting.

-- 
Evan Rempel