Hi, It is nice to hear that you are trying the db-parser. Let me try to help you with that, see my rather long answer down. On Fri, 2009-04-17 at 16:11 -0500, Martin Holste wrote:
Hi, I'm new to the list and syslog-ng in general. I'm building a centralized log collector and am very interested in the power of the db-parser() parsing module. It really has amazing potential, and I'm eager to implement it. I've been playing with it quite a bit with a proof-of-concept to parse firewall logs from Cisco FWSM blades. The $MSGONLY part looks like this for a firewall deny:
Deny udp src OUTSIDE:10.0.0.0/1234 dst INSIDE:192.168.0.0/5678 by access-group "OUTSIDE" [0xb74026ad, 0x0]
My working parser entry is thus:
<patterndb version='1' pub_date='2009-04-17'> <program name='FWSM'> <pattern>%FWSM</pattern> <rule id='1' class='security'> <pattern>Deny@QSTRING:FIREWALL.DENY_PROTO: @src</pattern> </rule> </program> </patterndb>
This works great and returns udp and tcp in the ${FIREWALL.DENY_PROTO} macro for logging, along with the ${.classifier.class} and ${.classifier.rule_id} macros.
However, when I try to parse out the interface, IP, and port numbers from "OUTSIDE:10.0.0.0/1234" part, the delimiters fail to capture correctly and the whole pattern misses. Here's what I'm trying to do:
<patterndb version='1' pub_date='2009-04-17'> <program name='FWSM'> <pattern>%FWSM</pattern> <rule id='1' class='security'> <pattern>Deny@QSTRING:FIREWALL.DENY_PROTO: @src@QSTRING:FIREWALL.DENY_O_INT: @:@IPv4 $:FIREWALL.DENY_SRCIP:@/@NUMBER:FIREWALL.DENY_SRCPORT: @dst</pattern> </rule> </program> </patterndb>
After much debugging, it appears that there is a problem using QSTRING to match non-space-delimited parsing boundaries. That is, you cannot parse arbitrarily, you have to match on space boundaries. Is this true, or am I doing something wrong? I even tried to parse the 'n' out of the word 'Deny' with a pattern like <pattern>De@QSTRING:test: @y</pattern> and that fails. From the debug, it appears that unless there is a space present, the radix key is off by one:
Looking up node in the radix tree; i='0', nodelen='0', keylen='138', root_key='', key='Deny udp src<snip></snip>' Looking up node in the radix tree; i='2', nodelen='2', keylen='138', root_key='De', key='Deny udp src<snip></snip>'
It looks like the key for the second entry should be key='ny udp src<snip></snip>' since the original 'De' match already hit. I put a lot of printf debugging statements in the code to see if I could figure out what was going wrong, but I havent' been able to conclude what the problem is yet, assuming arbitrary pattern delimiting was the intended goal. Is anyone able to successfully get db-parser() to parse on arbitrary characters?
Also, the source code refers to STRING and ESTRING, how are those different from QSTRING? It looked like ESTRING was probably just an offset-based version of QSTRING.
Short answer: The problem is with your pattern, try this one instead: Deny@QSTRING:FIREWALL.DENY_PROTO: @src@QSTRING:FIREWALL.DENY_O_INT: :@@IPv4$:FIREWALL.DENY_SRCIP@/@NUMBER:FIREWALL.DENY_SRCPORT@ dst Long answer: Let me explain the errors and how parsers operate. Basically all parser follow the same way how arguments can be specified, but there are some differences though. The most simple scenario where you only specify the parser type like this: @NUMBER@ which will parse and match a number without storing it in a variable or any other special function. If you want to store the matched value into a variable, which can be referenced latter in a macro substitution you can specify a name for the parser like this: @NUMBER:mynumber@ the arguments of the parser are separated by a colon ":", but only the type argument is mandatory the others are optional. The first two argument is the same for all parser type, while the third one has different meaning for different parsers. Using the third argument you can customize the parser on how it should parse/match. IPv4 and NUMBER parsers do not use the third argument only STRING, ESTRING and QSTRING are affected. The most simple one is STRING which matches a given text char-by-char while it sees an aplhanumeric character. With the optional third argument additional (non-aplhanum) characters could be specified. Given the following MSG: "user=marton1234 group=admin" the "@STRING:mytext@" pattern would only match the string "user" as the = char is non alphanum. However the "@STRING:mytext:=@" pattern would match "user=marton1234" and would stop at the whitespace. To match the whole MSG with the parser one would need to use the following pattern: "@STRING:mytext:= @" as it would match aplhanum characters plus the = sign and the ' ' whitespace as well. Of course normally one would use a better pattern to match the "user" and "group" part separately, like this: "user=@STRING:user@ group=@STRING:group@" QSTRING and ESTRING parser take a bit different and usually a faster approach to the problem. Rather than checking each chars one-by-one they look for the delimiters. QSTRING stands for "quoted string" so it would match any text between quotation marks which must be specified as the third argument for QSTRING. By default only one character needs to be specified which will be used as start and end quotation mark but it is possible to specify separately the starting and the ending marks. Now let's take the following MSG as an example: from='Marton <marci@server>' Using the "from=@QSTRING:mytext:'@" pattern, the mytext variable would hold the "Marton <marci@server>@ text between the ' marks. This case only one char was specified and it was used as a starting and ending mark as well. However it is possible to specify two chars to be used as a starting/ending marks, like this: "from=@QSTRING:mytext:' @" Now it would match from the ' char to the space char, so mytext would contain: "Marton " only. A better example would be to match texts between <>, like this: "from='@STRING:name@ @QSTRING:addres:<>@'" where name would contain "Marton", while the address variable would contain "marci@server". Using the QSTRING is faster than simple STRING, but it is not always possible to use it, specially when the first character is unknown in advance, and we want to specify only the last char. This case the ESTRING parser is handy which matches a text till an ending mark. To match the variable part of the previous example one would use the following pattern: "from='@ESTRING:mytext:'@". Now we match the first ' mark as a literal string and would match the remaining text till the second ' mark by ESTRING parser. Mind that the NUMBER and IPv4 parser only match a number (continues numeric characters) or an ipv4 address with doted notation. You can not specify other delimiters or such for these. (This was by the way the problem in your example pattern.) To match your other pattern: "De@QSTRING:test:y@" pattern you would need to use a MSG like this: "DeyANYTHINGy". I hope I could give you a better overview on how the parsers operate. Also feel free to drop me a mail if you have any further problem. Also you can download from BalaBit website a patterndb for Cisco PIX messages and a patterndb converted (by a script and little human interaction) from the logcheck database. cheers, Marton -- Key fingerprint = F78C 25CA 5F88 6FAF EA21 779D 3279 9F9E 1155 670D