[syslog-ng] db-parser QSTRING delmiter limitations

Martin Holste mcholste at gmail.com
Mon Apr 20 04:07:56 CEST 2009


Marton,

Thanks!  I actually figured out the QSTRING and ESTRING thing yesterday
while taking a closer look through the example patterndb.xml in the web
location you previously referenced.  I hadn't yet figured out STRING, so
your explanation was very helpful.  In any case, I think that your
explanation will help a lot of other people in my situation and should serve
as some makeshift documentation until it goes into the site proper.

Now that I've got my patterns matching and extracting properly, I need a way
to mass produce patterns for all of the logs I want to parse.  I'm working
on a Perl script which receives a JSON object (submitted via CGI) which
contains an example log string, a list of offsets within the string that
should be extracted, and the macro names to assign those extracted strings.
It then parses that specification and does some simple PCRE matching to
decide whether to use QSTRING, IPv4, or NUMBER for the extraction/static
text anchoring and produces the pattern.  I've finished that part, now I'm
working on the code to manage this all in a database.  My plan is to have a
web console for allowing end users (who may not be regexp wizards) a place
to paste in an example log, highlight the parts they want to extract, name
them, and then submit them.  Then when the pattern is created, it will go
into a database that can be used to auto-generate a patterndb.xml file as
well as custom filters, templates, and destinations per pattern so that the
parsed messages can be inserted directly into an indexed database.

So, here's a bit of a strange follow-up question: how will having one log
destination per pattern affect overall Syslog-NG performance?  Is there some
other way to use the custom templates which implement the custom macros from
the db-pattern() parser?  From what I could tell, the only place you can put
a template statement is in a destination statement.  So, in order to log the
custom macros, for each pattern there must be a unique filter, template, and
destination.  Is there some other strategy I could employ?  The only other
one I could think of would be to list every possible custom macro in one log
template and have a script use the ${.classification.rule_id} to know what
it should take from Syslog-NG and put into the database.

Thanks,

Martin

On Sun, Apr 19, 2009 at 3:59 PM, ILLES, Marton <illes.marton at balabit.hu>wrote:

> Hi,
>
> It is nice to hear that you are trying the db-parser. Let me try to help
> you with that, see my rather long answer down.
>
> On Fri, 2009-04-17 at 16:11 -0500, Martin Holste wrote:
> > Hi, I'm new to the list and syslog-ng in general.  I'm building a
> > centralized log collector and am very interested in the power of the
> > db-parser() parsing module.  It really has amazing potential, and I'm
> > eager to implement it.  I've been playing with it quite a bit with a
> > proof-of-concept to parse firewall logs from Cisco FWSM blades.  The
> > $MSGONLY part looks like this for a firewall deny:
> >
> > Deny udp src OUTSIDE:10.0.0.0/1234 dst INSIDE:192.168.0.0/5678 by
> > access-group "OUTSIDE" [0xb74026ad, 0x0]
> >
> > My working parser entry is thus:
> >
> > <patterndb version='1' pub_date='2009-04-17'>
> >   <program name='FWSM'>
> >     <pattern>%FWSM</pattern>
> >     <rule id='1' class='security'>
> >       <pattern>Deny at QSTRING:FIREWALL.DENY_PROTO: @src</pattern>
> >     </rule>
> >   </program>
> > </patterndb>
> >
> > This works great and returns udp and tcp in the ${FIREWALL.DENY_PROTO}
> > macro for logging, along with the ${.classifier.class} and
> > ${.classifier.rule_id} macros.
> >
> > However, when I try to parse out the interface, IP, and port numbers
> > from "OUTSIDE:10.0.0.0/1234" part, the delimiters fail to capture
> > correctly and the whole pattern misses.  Here's what I'm trying to do:
> >
> > <patterndb version='1' pub_date='2009-04-17'>
> >   <program name='FWSM'>
> >     <pattern>%FWSM</pattern>
> >     <rule id='1' class='security'>
> >       <pattern>Deny at QSTRING:FIREWALL.DENY_PROTO:
> > @src at QSTRING:FIREWALL.DENY_O_INT: @:@IPv4
> > $:FIREWALL.DENY_SRCIP:@/@NUMBER:FIREWALL.DENY_SRCPORT: @dst</pattern>
> >     </rule>
> >   </program>
> > </patterndb>
>
> > After much debugging, it appears that there is a problem using QSTRING
> > to match non-space-delimited parsing boundaries.  That is, you cannot
> > parse arbitrarily, you have to match on space boundaries.  Is this
> > true, or am I doing something wrong?  I even tried to parse the 'n'
> > out of the word 'Deny' with a pattern like <pattern>De at QSTRING:test:
> > @y</pattern> and that fails.  From the debug, it appears that unless
> > there is a space present, the radix key is off by one:
> >
> > Looking up node in the radix tree; i='0', nodelen='0', keylen='138',
> > root_key='', key='Deny udp src<snip></snip>'
> > Looking up node in the radix tree; i='2', nodelen='2', keylen='138',
> > root_key='De', key='Deny udp src<snip></snip>'
> >
> > It looks like the key for the second entry should be key='ny udp
> > src<snip></snip>' since the original 'De' match already hit.  I put a
> > lot of printf debugging statements in the code to see if I could
> > figure out what was going wrong, but I havent' been able to conclude
> > what the problem is yet, assuming arbitrary pattern delimiting was the
> > intended goal.  Is anyone able to successfully get db-parser() to
> > parse on arbitrary characters?
> >
> > Also, the source code refers to STRING and ESTRING, how are those
> > different from QSTRING?  It looked like ESTRING was probably just an
> > offset-based version of QSTRING.
>
>
> Short answer:
> The problem is with your pattern, try this one instead:
>
> Deny at QSTRING:FIREWALL.DENY_PROTO: @src at QSTRING:FIREWALL.DENY_O_INT:
> :@@IPv4$:FIREWALL.DENY_SRCIP@/@NUMBER:FIREWALL.DENY_SRCPORT@ dst
>
>
> Long answer:
> Let me explain the errors and how parsers operate. Basically all parser
> follow the same way how arguments can be specified, but there are some
> differences though. The most simple scenario where you only specify the
> parser type like this: @NUMBER@ which will parse and match a number
> without storing it in a variable or any other special function.
>
> If you want to store the matched value into a variable, which can be
> referenced latter in a macro substitution you can specify a name for the
> parser like this: @NUMBER:mynumber@ the arguments of the parser are
> separated by a colon ":", but only the type argument is mandatory the
> others are optional. The first two argument is the same for all parser
> type, while the third one has different meaning for different parsers.
>
> Using the third argument you can customize the parser on how it should
> parse/match. IPv4 and NUMBER parsers do not use the third argument only
> STRING, ESTRING and QSTRING are affected.
>
>
> The most simple one is STRING which matches a given text char-by-char
> while it sees an aplhanumeric character. With the optional third
> argument additional (non-aplhanum) characters could be specified.
>
> Given the following MSG:
> "user=marton1234 group=admin"
>
> the "@STRING:mytext@" pattern would only match the string "user" as the
> = char is non alphanum. However the "@STRING:mytext:=@" pattern would
> match "user=marton1234" and would stop at the whitespace. To match the
> whole MSG with the parser one would need to use the following pattern:
> "@STRING:mytext:= @" as it would match aplhanum characters plus the =
> sign and the ' ' whitespace as well. Of course normally one would use a
> better pattern to match the "user" and "group" part separately, like
> this: "user=@STRING:user@ group=@STRING:group@"
>
> QSTRING and ESTRING parser take a bit different and usually a faster
> approach to the problem. Rather than checking each chars one-by-one they
> look for the delimiters. QSTRING stands for "quoted string" so it would
> match any text between quotation marks which must be specified as the
> third argument for QSTRING. By default only one character needs to be
> specified which will be used as start and end quotation mark but it is
> possible to specify separately the starting and the ending marks.
>
> Now let's take the following MSG as an example:
> from='Marton <marci at server>'
>
> Using the "from=@QSTRING:mytext:'@" pattern, the mytext variable would
> hold the "Marton <marci at server>@ text between the ' marks. This case
> only one char was specified and it was used as a starting and ending
> mark as well. However it is possible to specify two chars to be used as
> a starting/ending marks, like this: "from=@QSTRING:mytext:' @" Now it
> would match from the ' char to the space char, so mytext would contain:
> "Marton " only. A better example would be to match texts between <>,
> like this: "from='@STRING:name@ @QSTRING:addres:<>@'" where name would
> contain "Marton", while the address variable would contain
> "marci at server".
>
> Using the QSTRING is faster than simple STRING, but it is not always
> possible to use it, specially when the first character is unknown in
> advance, and we want to specify only the last char. This case the
> ESTRING parser is handy which matches a text till an ending mark.
>
> To match the variable part of the previous example one would use the
> following pattern: "from='@ESTRING:mytext:'@". Now we match the first '
> mark as a literal string and would match the remaining text till the
> second ' mark by ESTRING parser.
>
> Mind that the NUMBER and IPv4 parser only match a number (continues
> numeric characters) or an ipv4 address with doted notation. You can not
> specify other delimiters or such for these. (This was by the way the
> problem in your example pattern.)
>
> To match your other pattern: "De at QSTRING:test:y@" pattern you would need
> to use a MSG like this: "DeyANYTHINGy".
>
> I hope I could give you a better overview on how the parsers operate.
> Also feel free to drop me a mail if you have any further problem.
>
> Also you can download from BalaBit website a patterndb for Cisco PIX
> messages and a patterndb converted (by a script and little human
> interaction) from the logcheck database.
>
> cheers,
>
> Marton
> --
> Key fingerprint = F78C 25CA 5F88 6FAF EA21 779D 3279 9F9E 1155 670D
>
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20090419/1bd798db/attachment-0001.htm 


More information about the syslog-ng mailing list