That would be awesome, I just can't code in C or C++. I would suppose though, that an interested party could copy most of the CSV parser code and just have to implement a function to sub-parse the equal sign delimiter. On Wed, Dec 8, 2010 at 2:47 PM, Balazs Scheidler <bazsi@balabit.hu> wrote:
Hi,
Although I really like the ideas floating around, the best way to address this issue is to write a welf parser plugin to syslog-ng which simply produces name-value pairs from the input, without having to pipe them out to an external process.
The round-trip (pipe-write, pipe-read, process, pipe-write, pipe-read) is simply enormous.
And 3.2 already has plugins in place, so we only need someone volunteering to write a welf parser. :)
Something along the lines of:
parser { welf-parser(prefix(".welf")); };
Which would put all name-value pairs in the input into name-value pairs, prefixed with '.welf', e.g. name1=value1 would become an NV pair in syslog-ng with the name ${.welf.name1} and value "value1".
Does that make sense? Or I'm missing something?
On Mon, 2010-12-06 at 13:01 -0700, Bill Anderson wrote:
On Dec 6, 2010, at 12:37 PM, Martin Holste wrote:
Agreed, Perl is plenty quick, hence my wondering about the actual volume. If it is too much for Perl I'd go w/C++.
From what I can tell, PCRE in Perl (or Python or whatever) is really close to C/C++ speeds because they're essentially using the same library and therefore mostly the same syscalls. I'd be really interested if anyone has benchmarks. I'd expect something like 10% better performance in C, but not much more, assuming that the vast majority of CPU time is spent on PCRE.
Yeah I was thinking the overhead might be in what is done, as opposed to just the RE portion. Of course, the OP script might be implemented rather differently. ;)
Personally, I'd make the last step routing back into syslog-ng with a source on a custom port and letting syslog handle the writing to disk. That way you can still use macros such as timestamps, etc.. Then again, that may be because I do that all the time. ;) A log statement that takes everything from the custom source and logs to a file should work beautifully; no need for filters though you could still do additional processing if needed. That said I'd also consider running a daemon that accepted all the input, formatted it, and then sent it to syslog-ng, pointing the clients at the custom daemon if that was possible.
One advantage to the daemon route is that it wouldn't *have* to reside on the same system.
Yep, you could definitely let Syslog-NG handle the last mile as well. I was trying to keep the scope as narrow as possible in my example.
I wonder if you could build an NFA state machine by conditionally looping output from a pattern-db parsed message into a source in Syslog-NG with a different pattern-db, depending on the previous output. Something like a token parser pdb that does an ESTRING up until " " and another one that only expects the key/val pair to be sent to it as the message. So it comes in as k1=v1 k2=v2 and the first kv gets gobbled up and then sent to another pdb source with a pdb which only matches if the message starts with certain terms. Then the rest of the original message is looped back to itself using @ANYSTRING@ to capture the remainder, that is, minus the kv which was sent to the kv pdb. It would keep recursively looping like that until there's no message left. If that all worked, your pattern db would be extremely simple as it would just be a pattern per key you were looking for, and order would no longer be an issue.
Maybe I'm nuts, but that sounds awesome to me. :D
Of course there's still the problem of demuxing the whole thing back into a coherent message, but I think that could be done a number of ways by passing the MSGID token with each part and using the new conditionals present in OSE 3.2.
Well, there is message correlation in 3.2.1 right? muahahaha
If OSE 3.3 can really do close to 1 million msgs/sec, then the overhead of resubmitting the same log many times may be bearable, especially with the threading.
True the rate might be the downside to that mechanism. However, the terseness of the messages might make up for some of it.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
-- Bazsi
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html