[syslog-ng] pdbtool patternize update and my syslog-ng 3.2 branch

Peter Gyongyosi gyp at balabit.hu
Sun Sep 26 14:57:38 CEST 2010


  Hi,

On 09/24/2010 07:57 PM, Matthew Hall wrote:
> I wondered if the memory leaks you said existed in the old version had
> been fixed, you did not say one way or the other in your mail.

Most of the major memleaks are fixed, yes. Valgrind still shows some 
problems I couldn't fix, but they're either minor (a couple of Ks 
compared to the ~1G test run I checked it with) or only occur at the end 
of the process: pretty much the whole struct containing the patterns is 
leaked when the program ends. As it only happens right before pdbtool 
exits, I didn't really care about it so far, but I might fix it in the 
future for the sake of general neatness, but it shouldn't affect the 
memory usage of the tool.

The bigger problem is that the memory usage of patternize is, while 
being linear to the number of loglines, still huge. It could be 
optimized here&there, maybe even up to being 30-50% more effective and 
this is something I'm planning to do, but the main problem is that it 
needs to read everything into memory. I'm trying to figure out how to 
avoid this or at least how to make it degrade more gracefully when 
running out of physical RAM than start swapping which slows down things 
terribly. The core of the problem is that as it goes over the loglines, 
it needs to be able to look up the already collected words/patterns to 
find out which words/patterns are frequent to be able to create the 
final patterns. Maybe it'd be possible to use some disk-backed solution 
that writes out things when they couldn't fit into physical memory, but 
it woudn't perform much better than swapping as "frequent words" in 
loglines are really rare and we'd end up touching the written-to-disk 
part of the database all the time which would ruin the performance...

Anyway, I'm just thinking loud :) What I'm really trying to say here is 
that unless some miracle happens, the memory usage won't improve 
drastically, and it's because of a conceptual problem, not memory leaks :(


> I also wonder if anybody at Balabit could tell me how to build a copy of
> your Git tree on RHEL 4 or RHEL 5. I get problems because the PCRE is
> too old but when I switch to new PCRE, PCRE will not build because the
> autotools and pkg-config are too old.
>
> It's a problem for me because unfortunately my company only supports
> RHEL here and otherwise I have to run it in an Ubuntu 10.04 or Debian VM
> with way too little memory for the tool to run right.
>
> Would it be possible to build a version of your tree for RHEL 4 or 5?

Regarding this I'll have to refer you to other guys here -- I've 
personally never tried to compile syslog-ng on anything but Ubuntus. 
I've sent in the code to our internal buildsystem but because patternize 
introduces a new dependency (libuuid for generating the pattern ids) the 
compilation has failed and I did not want to mess with the builders 
without asking the guys managing them. I'll try to ask around tomorrow 
and get you an RPM or at least a more usable answer with some tips :)

greets,
Peter



More information about the syslog-ng mailing list