Evan Rempel wrote:
Balazs Scheidler wrote:
On Mon, 2008-10-06 at 11:38 -0700, Evan Rempel wrote:
Balazs Scheidler wrote:
On Sun, 2008-10-05 at 13:49 -0700, Evan Rempel wrote:
Balazs Scheidler wrote:
>>>>> We have cases where a syslog-ng buffers quite a bit of data (2GB or so) before >>>>> it is able to flush the data to the destination. Currently, the only way to "free up" that RAM >>>>> is to restart syslog-ng, which means that some log messages are lost during the restart >>>>> window. Both PE and OSE frees up disk/memory when possible. If they don't, then that's probably a leak. No, it isn't a memory leak as the process can be months old and only grows when it buffers messages.
So if your syslog-ng process is 2GB in size, and you didn't specify an overly large log_fifo_size(), it's probably because of a leak.
I have a large log_fifo_size() but the process starts small, grows when it needs to buffer, and does NOT get small again when the destination is available. Hmm... that might be related to the malloc implementation. Which platform are you using?
> 2. The buffer space should be thrown away only after the message > has been delivered to the destination. I see, this is what should happen now.
This does not appear to be happening. I am using OSE 2.0.8 Have there been memory fixes/changes since then? Not that I know of.
Is there a way to get buffer count statistics out? How many messages are in each fifo buffer? In the experimental 3.0 branch, yes there is. In current 2.0, I'm afraid there's none.
Is there a limit on the growth of syslog-ng? I mean if it gets large once and then empties its buffers, it stays large, but does it start growing again, or it stays steadily at the size it was before? I have not specifically tested this scenario, but we do track processes and memory usage, and it appears that once the space is allocated, it does not grow unless a larger set of messages gets buffered.
Again, the platform matters a lot, how the local malloc implementation takes care about fragmentation, whether it shrinks the process once memory is freed. This is running on Redhat AS 3 (2.4 kernel) and Redhat AS 4 (2.6 kernel)
On the Redhat 4 system the glibc packages installed are
glibc-common-2.3.4-2.36 glibc-devel-2.3.4-2.36 glibc-headers-2.3.4-2.36 glib2-2.4.7-1 glibc-kernheaders-2.4-9.1.100.EL
hmm.. this version should be all right, at least in my experience glibc tries to release memory back to the system once the application freed it using free().
Maybe it is some kind of memory fragmentation issue. How easy is this to reproduce? Does it require months of running, or you have a simple recipe to create it?
We are encountering this fairly easily in production, so I should be able to verify a recipe. It will take me a couple of days to "get to it", but will report back.
I thought that this deserved some attention... make a destination destination erempel { pipe("/var/log/syslog.pipes/erempel" log_fifo_size(1000000) perm(0666) ); }; and set up some logging to it. I directed only *.debug stuff to mine. filter f_debug { priority(debug); }; log { source(local); filter(f_debug); destination(erempel); flags(final); }; Start a "tail -f /var/log/syslog.pipes/erempel" and then generate some logs ls -1 /usr/bin | logger -p user.debug and I see the stuff pop out from the "tail -f" Cancel the tail process and product a whole bunch more logs. ls -1 /usr/bin | logger -p user.debug repeatedly while watching the size of the syslog process. Tue Oct 7 08:56:09 PDT 2008 root 23265 1.7 0.1 4004 1396 ? Ss 08:53 0:02 /usr/local/sbin/syslog-ng Tue Oct 7 08:56:14 PDT 2008 root 23265 1.7 0.1 4004 1412 ? Ss 08:53 0:02 /usr/local/sbin/syslog-ng Tue Oct 7 08:56:19 PDT 2008 root 23265 1.7 0.1 4004 1412 ? Ss 08:53 0:02 /usr/local/sbin/syslog-ng Tue Oct 7 08:56:24 PDT 2008 root 23265 2.5 1.2 15524 12840 ? Ss 08:53 0:04 /usr/local/sbin/syslog-ng Tue Oct 7 08:56:29 PDT 2008 root 23265 3.6 3.1 35384 32668 ? Ss 08:53 0:06 /usr/local/sbin/syslog-ng Tue Oct 7 08:56:34 PDT 2008 root 23265 4.7 5.1 55904 53264 ? Ss 08:53 0:08 /usr/local/sbin/syslog-ng Tue Oct 7 08:56:40 PDT 2008 root 23265 5.7 7.0 75724 73100 ? Ss 08:53 0:10 /usr/local/sbin/syslog-ng Tue Oct 7 08:56:45 PDT 2008 root 23265 5.6 7.1 76504 73864 ? Ss 08:53 0:10 /usr/local/sbin/syslog-ng Tue Oct 7 08:56:50 PDT 2008 root 23265 6.8 7.1 76504 73864 ? Ss 08:53 0:13 /usr/local/sbin/syslog-ng Tue Oct 7 08:56:55 PDT 2008 root 23265 6.6 7.1 76504 73864 ? Ss 08:53 0:13 /usr/local/sbin/syslog-ng The intent is to start the "rail -f" again to read all of the records and see if the memory gets smaller, but when I start the tail -f again, I get NOTHING. Syslog-ng seems to have closed the pipe destination and is NOT reopening it. Fortunately, the author produced a HUP signal that will cause all outputs to be closed and opened again. So, I do a HUP, and only the LAST 10 records are displayed. Where did all of the other records go? If I do ls -1 /usr/bin | logger -p user.debug I see the full list from the "tail -f" Back to memory. The memory never gets smaller. Now I am beginning to thing that the records got orphaned inside of syslog-ng such that they never made it to the destination, and as such are never candidates for memory reclaiming. I think I have stumbled on a bug for record loss, and I would like to state again that ALL destinations should follow the time_reopen() setting. -- Evan Rempel