[syslog-ng] db-parser QSTRING delmiter limitations

ILLES, Marton illes.marton at balabit.hu
Sun Apr 19 22:59:42 CEST 2009


Hi,

It is nice to hear that you are trying the db-parser. Let me try to help
you with that, see my rather long answer down.

On Fri, 2009-04-17 at 16:11 -0500, Martin Holste wrote:
> Hi, I'm new to the list and syslog-ng in general.  I'm building a
> centralized log collector and am very interested in the power of the
> db-parser() parsing module.  It really has amazing potential, and I'm
> eager to implement it.  I've been playing with it quite a bit with a
> proof-of-concept to parse firewall logs from Cisco FWSM blades.  The
> $MSGONLY part looks like this for a firewall deny:
> 
> Deny udp src OUTSIDE:10.0.0.0/1234 dst INSIDE:192.168.0.0/5678 by
> access-group "OUTSIDE" [0xb74026ad, 0x0]
> 
> My working parser entry is thus:
> 
> <patterndb version='1' pub_date='2009-04-17'>
>   <program name='FWSM'>
>     <pattern>%FWSM</pattern>
>     <rule id='1' class='security'>
>       <pattern>Deny at QSTRING:FIREWALL.DENY_PROTO: @src</pattern>
>     </rule>
>   </program>
> </patterndb>
> 
> This works great and returns udp and tcp in the ${FIREWALL.DENY_PROTO}
> macro for logging, along with the ${.classifier.class} and
> ${.classifier.rule_id} macros.
> 
> However, when I try to parse out the interface, IP, and port numbers
> from "OUTSIDE:10.0.0.0/1234" part, the delimiters fail to capture
> correctly and the whole pattern misses.  Here's what I'm trying to do:
> 
> <patterndb version='1' pub_date='2009-04-17'>
>   <program name='FWSM'>
>     <pattern>%FWSM</pattern>
>     <rule id='1' class='security'>
>       <pattern>Deny at QSTRING:FIREWALL.DENY_PROTO:
> @src at QSTRING:FIREWALL.DENY_O_INT: @:@IPv4
> $:FIREWALL.DENY_SRCIP:@/@NUMBER:FIREWALL.DENY_SRCPORT: @dst</pattern>
>     </rule>
>   </program>
> </patterndb>

> After much debugging, it appears that there is a problem using QSTRING
> to match non-space-delimited parsing boundaries.  That is, you cannot
> parse arbitrarily, you have to match on space boundaries.  Is this
> true, or am I doing something wrong?  I even tried to parse the 'n'
> out of the word 'Deny' with a pattern like <pattern>De at QSTRING:test:
> @y</pattern> and that fails.  From the debug, it appears that unless
> there is a space present, the radix key is off by one:
> 
> Looking up node in the radix tree; i='0', nodelen='0', keylen='138',
> root_key='', key='Deny udp src<snip></snip>'
> Looking up node in the radix tree; i='2', nodelen='2', keylen='138',
> root_key='De', key='Deny udp src<snip></snip>'
> 
> It looks like the key for the second entry should be key='ny udp
> src<snip></snip>' since the original 'De' match already hit.  I put a
> lot of printf debugging statements in the code to see if I could
> figure out what was going wrong, but I havent' been able to conclude
> what the problem is yet, assuming arbitrary pattern delimiting was the
> intended goal.  Is anyone able to successfully get db-parser() to
> parse on arbitrary characters?
> 
> Also, the source code refers to STRING and ESTRING, how are those
> different from QSTRING?  It looked like ESTRING was probably just an
> offset-based version of QSTRING.


Short answer:
The problem is with your pattern, try this one instead:

Deny at QSTRING:FIREWALL.DENY_PROTO: @src at QSTRING:FIREWALL.DENY_O_INT: :@@IPv4$:FIREWALL.DENY_SRCIP@/@NUMBER:FIREWALL.DENY_SRCPORT@ dst


Long answer:
Let me explain the errors and how parsers operate. Basically all parser
follow the same way how arguments can be specified, but there are some
differences though. The most simple scenario where you only specify the
parser type like this: @NUMBER@ which will parse and match a number
without storing it in a variable or any other special function.

If you want to store the matched value into a variable, which can be
referenced latter in a macro substitution you can specify a name for the
parser like this: @NUMBER:mynumber@ the arguments of the parser are
separated by a colon ":", but only the type argument is mandatory the
others are optional. The first two argument is the same for all parser
type, while the third one has different meaning for different parsers.

Using the third argument you can customize the parser on how it should
parse/match. IPv4 and NUMBER parsers do not use the third argument only
STRING, ESTRING and QSTRING are affected.


The most simple one is STRING which matches a given text char-by-char
while it sees an aplhanumeric character. With the optional third
argument additional (non-aplhanum) characters could be specified.

Given the following MSG:
"user=marton1234 group=admin"

the "@STRING:mytext@" pattern would only match the string "user" as the
= char is non alphanum. However the "@STRING:mytext:=@" pattern would
match "user=marton1234" and would stop at the whitespace. To match the
whole MSG with the parser one would need to use the following pattern:
"@STRING:mytext:= @" as it would match aplhanum characters plus the =
sign and the ' ' whitespace as well. Of course normally one would use a
better pattern to match the "user" and "group" part separately, like
this: "user=@STRING:user@ group=@STRING:group@"

QSTRING and ESTRING parser take a bit different and usually a faster
approach to the problem. Rather than checking each chars one-by-one they
look for the delimiters. QSTRING stands for "quoted string" so it would
match any text between quotation marks which must be specified as the
third argument for QSTRING. By default only one character needs to be
specified which will be used as start and end quotation mark but it is
possible to specify separately the starting and the ending marks.

Now let's take the following MSG as an example:
from='Marton <marci at server>'

Using the "from=@QSTRING:mytext:'@" pattern, the mytext variable would
hold the "Marton <marci at server>@ text between the ' marks. This case
only one char was specified and it was used as a starting and ending
mark as well. However it is possible to specify two chars to be used as
a starting/ending marks, like this: "from=@QSTRING:mytext:' @" Now it
would match from the ' char to the space char, so mytext would contain:
"Marton " only. A better example would be to match texts between <>,
like this: "from='@STRING:name@ @QSTRING:addres:<>@'" where name would
contain "Marton", while the address variable would contain
"marci at server".

Using the QSTRING is faster than simple STRING, but it is not always
possible to use it, specially when the first character is unknown in
advance, and we want to specify only the last char. This case the
ESTRING parser is handy which matches a text till an ending mark.

To match the variable part of the previous example one would use the
following pattern: "from='@ESTRING:mytext:'@". Now we match the first '
mark as a literal string and would match the remaining text till the
second ' mark by ESTRING parser.

Mind that the NUMBER and IPv4 parser only match a number (continues
numeric characters) or an ipv4 address with doted notation. You can not
specify other delimiters or such for these. (This was by the way the
problem in your example pattern.)

To match your other pattern: "De at QSTRING:test:y@" pattern you would need
to use a MSG like this: "DeyANYTHINGy".

I hope I could give you a better overview on how the parsers operate.
Also feel free to drop me a mail if you have any further problem.

Also you can download from BalaBit website a patterndb for Cisco PIX
messages and a patterndb converted (by a script and little human
interaction) from the logcheck database.

cheers,

Marton
-- 
Key fingerprint = F78C 25CA 5F88 6FAF EA21 779D 3279 9F9E 1155 670D




More information about the syslog-ng mailing list