[syslog-ng] advice/assistance with parsing attempt requested

Wed Dec 8 22:33:34 CET 2010

That would be awesome, I just can't code in C or C++.  I would suppose
though, that an interested party could copy most of the CSV parser
code and just have to implement a function to sub-parse the equal sign
delimiter.

On Wed, Dec 8, 2010 at 2:47 PM, Balazs Scheidler <bazsi at balabit.hu> wrote:
>
> Hi,
>
> Although I really like the ideas floating around, the best way to
> address this issue is to write a welf parser plugin to syslog-ng which
> simply produces name-value pairs from the input, without having to pipe
> them out to an external process.
>
> The round-trip (pipe-write, pipe-read, process, pipe-write, pipe-read)
> is simply enormous.
>
> And 3.2 already has plugins in place, so we only need someone
> volunteering to write a welf parser. :)
>
> Something along the lines of:
>
> parser { welf-parser(prefix(".welf")); };
>
> Which would put all name-value pairs in the input into name-value pairs,
> prefixed with '.welf', e.g. name1=value1 would become an NV pair in
> syslog-ng with the name ${.welf.name1} and value "value1".
>
> Does that make sense? Or I'm missing something?
>
> On Mon, 2010-12-06 at 13:01 -0700, Bill Anderson wrote:
>> On Dec 6, 2010, at 12:37 PM, Martin Holste wrote:
>>
>> >> Agreed, Perl is plenty quick, hence my wondering about the actual volume. If it is too much for Perl I'd go w/C++.
>> >
>> > From what I can tell, PCRE in Perl (or Python or whatever) is really
>> > close to C/C++ speeds because they're essentially using the same
>> > library and therefore mostly the same syscalls.  I'd be really
>> > interested if anyone has benchmarks.  I'd expect something like 10%
>> > better performance in C, but not much more, assuming that the vast
>> > majority of CPU time is spent on PCRE.
>>
>> Yeah I was thinking the overhead might be in what is done, as opposed to just the RE portion. Of course, the OP script might be implemented rather differently. ;)
>>
>>
>> >
>> >> Personally, I'd make the last step routing back into syslog-ng with a source on a custom port and letting syslog handle the writing to disk. That way you can still use macros such as timestamps, etc.. Then again, that may be because I do that all the time. ;) A log statement that takes everything from the custom source and logs to a file should work beautifully; no need for filters though you could still do additional processing if needed. That said I'd also consider running a daemon that accepted all the input, formatted it, and then sent it to syslog-ng, pointing the clients at the custom daemon if that was possible.
>> >>
>> >> One advantage to the daemon route is that it wouldn't *have* to reside on the same system.
>> >
>> > Yep, you could definitely let Syslog-NG handle the last mile as well.
>> > I was trying to keep the scope as narrow as possible in my example.
>> >
>> > I wonder if you could build an NFA state machine by conditionally
>> > looping output from a pattern-db parsed message into a source in
>> > Syslog-NG with a different pattern-db, depending on the previous
>> > output.  Something like a token parser pdb that does an ESTRING up
>> > until " " and another one that only expects the key/val pair to be
>> > sent to it as the message.  So it comes in as k1=v1 k2=v2 and the
>> > first kv gets gobbled up and then sent to another pdb source with a
>> > pdb which only matches if the message starts with certain terms.  Then
>> > the rest of the original message is looped back to itself using
>> > @ANYSTRING@ to capture the remainder, that is, minus the kv which was
>> > sent to the kv pdb.  It would keep recursively looping like that until
>> > there's no message left.  If that all worked, your pattern db would be
>> > extremely simple as it would just be a pattern per key you were
>> > looking for, and order would no longer be an issue.
>>
>> Maybe I'm nuts, but that sounds awesome to me. :D
>>
>> > Of course there's
>> > still the problem of demuxing the whole thing back into a coherent
>> > message, but I think that could be done a number of ways by passing
>> > the MSGID token with each part and using the new conditionals present
>> > in OSE 3.2.
>>
>> Well, there is message correlation in 3.2.1 right?  muahahaha
>>
>> > If OSE 3.3 can really do close to 1 million msgs/sec,
>> > then the overhead of resubmitting the same log many times may be
>> > bearable, especially with the threading.
>>
>> True the rate might be the downside to that mechanism. However, the terseness of the messages might make up for some of it.
>>
>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>
>>
>
> --
> Bazsi
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>