On Tue, 2009-12-15 at 14:12 -0500, Doug Warner wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12/15/2009 12:03 PM, Doug Warner wrote:
I'm running syslog-ng 3.0.4 OSE on Gentoo and have a central syslog host that accepts about 3Mbps of log traffic via a TCP pipe. We've recently been having oom-killer problems and I finally tracked it down to the fact that we seem to be leaking objects in the size-4096 cache. Now, slabinfo is new to me so I could be interpreting this wrong, but restarting syslog-ng definite reclaims this cache.
What data can I provide to help track down this problem?
I noticed that there seems to be a similar error in 3.1 beta, reported here: http://thread.gmane.org/gmane.comp.syslog-ng/8544
Attached is a valgrind for the process after a couple hours. Let me know if I can be any more help.
slabinfo contains information about kernel memory allocations, which can certainly be attributed to syslog-ng, when the kernel does an allocation on behalf of syslog-ng. but it would be important to know what kind of kernel objects syslog-ng allocates. One thing is sure, mere user-space memory usage as can be allocated by syslog-ng will never show up in size-4096 cache, since the kernel allocates pages directly in case userspace programs need that. And size-4096 in slab cache certainly is associated with a kmalloc()-ed object. Possibly a socket structure (but AFAIR it has its own slab). Can you check if: * syslog-ng has too many open sockets (visible for example with lsof) * whether it reads its input queue (although network packets would again not show up in the slab), using netstat -antp * possibly check the network statistics: netstat -ns The 3.1 problem was completely 3.1 specific and has nothing to do with 3.0 (and in the meanwhile it was solved too, offlist) -- Bazsi