Hi, On 09/24/2010 07:57 PM, Matthew Hall wrote:
I wondered if the memory leaks you said existed in the old version had been fixed, you did not say one way or the other in your mail.
Most of the major memleaks are fixed, yes. Valgrind still shows some problems I couldn't fix, but they're either minor (a couple of Ks compared to the ~1G test run I checked it with) or only occur at the end of the process: pretty much the whole struct containing the patterns is leaked when the program ends. As it only happens right before pdbtool exits, I didn't really care about it so far, but I might fix it in the future for the sake of general neatness, but it shouldn't affect the memory usage of the tool. The bigger problem is that the memory usage of patternize is, while being linear to the number of loglines, still huge. It could be optimized here&there, maybe even up to being 30-50% more effective and this is something I'm planning to do, but the main problem is that it needs to read everything into memory. I'm trying to figure out how to avoid this or at least how to make it degrade more gracefully when running out of physical RAM than start swapping which slows down things terribly. The core of the problem is that as it goes over the loglines, it needs to be able to look up the already collected words/patterns to find out which words/patterns are frequent to be able to create the final patterns. Maybe it'd be possible to use some disk-backed solution that writes out things when they couldn't fit into physical memory, but it woudn't perform much better than swapping as "frequent words" in loglines are really rare and we'd end up touching the written-to-disk part of the database all the time which would ruin the performance... Anyway, I'm just thinking loud :) What I'm really trying to say here is that unless some miracle happens, the memory usage won't improve drastically, and it's because of a conceptual problem, not memory leaks :(
I also wonder if anybody at Balabit could tell me how to build a copy of your Git tree on RHEL 4 or RHEL 5. I get problems because the PCRE is too old but when I switch to new PCRE, PCRE will not build because the autotools and pkg-config are too old.
It's a problem for me because unfortunately my company only supports RHEL here and otherwise I have to run it in an Ubuntu 10.04 or Debian VM with way too little memory for the tool to run right.
Would it be possible to build a version of your tree for RHEL 4 or 5?
Regarding this I'll have to refer you to other guys here -- I've personally never tried to compile syslog-ng on anything but Ubuntus. I've sent in the code to our internal buildsystem but because patternize introduces a new dependency (libuuid for generating the pattern ids) the compilation has failed and I did not want to mess with the builders without asking the guys managing them. I'll try to ask around tomorrow and get you an RPM or at least a more usable answer with some tips :) greets, Peter