[syslog-ng] advice/assistance with parsing attempt requested

Martin Holste mcholste at gmail.com
Mon Dec 6 19:15:06 CET 2010


Good points, Bill.

This is a cool challenge!

If the values can really come in any order and you don't necessary
know all possible extra values ahead of time, then there's a good
chance that regexp is your only hope, through Perl or other means.
Pattern-db is really not setup to do this kind of thing, because the
order changes.

This must be pretty high volume, as I've got Perl doing regexp on
around 3-4k large messages per second with no problems.  If that's the
case, maybe you want a hybrid solution of some sort where you do some
of the formatting in pattern-db, but then output to Perl for the final
parsing and writing.

Another tactic might be to do multi-core processing with Perl by
having Syslog-NG pipe to a master Perl process which uses round-robin
load-balancing and the IO::AIO CPAN module to asynchronously send the
logs to child processes where the actual PCRE matches take place.
Something like:

Logs -> Syslog-NG -> Perl master -> AIO to Perl Child n -> write file to disk

Can you send a snippet of what your Perl script looks like?  One
regexp should be able to parse the message into an array, and a simple
hash lookup should be enough to toss the "extra" key/val pairs.
Here's how I would do it:

my %keep = ( namedparser1 => 1, namedparser2 => 1, namedparser3 => 1,
namedparser4 => 1, namedparser5 => 1);
my $test_msg = q{extra1=extravalue1 namedparser3=namedparser3value
extra2=extravalue2 namedparser4=namedparser4value
namedparser5=namedparser5value extra3=extravalue3 extra4=extravalue4};
my @arr = $test_msg =~ /(\w+)\=(\w+)/g;
my @kept;
for (my $i = 0; $i < $#arr; $i += 2){
  if ($keep{ $arr[$i] }){
    push @kept, $arr[$i] . "=" . $arr[$i+1];
  }
}
print join(" ", @kept) . "\n";

Which should print:
namedparser3=namedparser3value namedparser4=namedparser4value
namedparser5=namedparser5value

On Mon, Dec 6, 2010 at 8:38 AM, Bill Anderson
<Bill.Anderson at bodybuilding.com> wrote:
>
> On Dec 6, 2010, at 4:18 AM, <syslog-ng2010 at hushmail.com> wrote:
>
>> i've spent the better part of the past week reading and trying to
>> understand both the documentation and list posts trying to sort
>> this out, if anyone can offer some advice as to whether this is
>> possible or not and if so, what i'm doing wrong; i would really
>> appreciate it! …
>>
>> i have a simple enough task, or so i thought! i've got a syslog
>> stream being received by syslog-ng with too much data. what i'd
>> like to do is parse out pieces of the stream and write only those
>> to a file. the tricky part is that the order of the stream is very
>> variable so that sometimes the desired named parser preceding
>> strings and associated values are present and sometimes not.
>> furthermore, the extra data is also quite variable. can this
>> challenge even be addressed with syslog-ng ose? if so, can it be
>> done with patterned without creating a pattern for EVERY variation
>> of possible streams?
>
> I believe you can use the parser and filter in combination to log on match essentially. With this you would only need to set up patterns for the possible combinations you actually want to log/reduce.
>
>> for clarification, we've tried to leverage an
>> external perl script which does this using regexs but, it seems
>> that it can't keep up with the stream, we only receive 10% of the
>> original events in the output. if this (external parsing script)
>> only way this can be done, we will continue our efforts to enhance
>> the external script but, if this is possible to be done natively
>> within syslog-ng, i'd rather do that.
>
> What is the volume of events here per-second and per-minute? Perl may not be the right tool for the job here (assuming you can't get it done natively in syslog-ng). If there are too many patterns for you to create you might consider sending the base matches for this to an external daemon that processes them and sends them back into syslog for storage. Then again, I'd be weighing the cost of patterns vs. external script or daemon. How much time to imply input the patterns? If you can do it in a script, you can have the script write your patterndb file for starters. Then there is the cost of adding new entries when they come around (assuming they do) vs. adding to the code.
>
> Another option, if you don't want to keep the extras might be to use a rewrite rule to remove extra1=extravalue1 prior to running the parser.
>
>
> Cheers,
> Bill
>
> --
> Bill Anderson, RHCE
> Linux Systems Engineer
> bill.anderson at bodybuilding.com
>
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>


More information about the syslog-ng mailing list