[syslog-ng] pdbtool patternize update and my syslog-ng 3.2 branch
Peter Gyongyosi
gyp at balabit.hu
Sun Sep 26 14:57:38 CEST 2010
Hi,
On 09/24/2010 07:57 PM, Matthew Hall wrote:
> I wondered if the memory leaks you said existed in the old version had
> been fixed, you did not say one way or the other in your mail.
Most of the major memleaks are fixed, yes. Valgrind still shows some
problems I couldn't fix, but they're either minor (a couple of Ks
compared to the ~1G test run I checked it with) or only occur at the end
of the process: pretty much the whole struct containing the patterns is
leaked when the program ends. As it only happens right before pdbtool
exits, I didn't really care about it so far, but I might fix it in the
future for the sake of general neatness, but it shouldn't affect the
memory usage of the tool.
The bigger problem is that the memory usage of patternize is, while
being linear to the number of loglines, still huge. It could be
optimized here&there, maybe even up to being 30-50% more effective and
this is something I'm planning to do, but the main problem is that it
needs to read everything into memory. I'm trying to figure out how to
avoid this or at least how to make it degrade more gracefully when
running out of physical RAM than start swapping which slows down things
terribly. The core of the problem is that as it goes over the loglines,
it needs to be able to look up the already collected words/patterns to
find out which words/patterns are frequent to be able to create the
final patterns. Maybe it'd be possible to use some disk-backed solution
that writes out things when they couldn't fit into physical memory, but
it woudn't perform much better than swapping as "frequent words" in
loglines are really rare and we'd end up touching the written-to-disk
part of the database all the time which would ruin the performance...
Anyway, I'm just thinking loud :) What I'm really trying to say here is
that unless some miracle happens, the memory usage won't improve
drastically, and it's because of a conceptual problem, not memory leaks :(
> I also wonder if anybody at Balabit could tell me how to build a copy of
> your Git tree on RHEL 4 or RHEL 5. I get problems because the PCRE is
> too old but when I switch to new PCRE, PCRE will not build because the
> autotools and pkg-config are too old.
>
> It's a problem for me because unfortunately my company only supports
> RHEL here and otherwise I have to run it in an Ubuntu 10.04 or Debian VM
> with way too little memory for the tool to run right.
>
> Would it be possible to build a version of your tree for RHEL 4 or 5?
Regarding this I'll have to refer you to other guys here -- I've
personally never tried to compile syslog-ng on anything but Ubuntus.
I've sent in the code to our internal buildsystem but because patternize
introduces a new dependency (libuuid for generating the pattern ids) the
compilation has failed and I did not want to mess with the builders
without asking the guys managing them. I'll try to ask around tomorrow
and get you an RPM or at least a more usable answer with some tips :)
greets,
Peter
More information about the syslog-ng
mailing list