[syslog-ng] db-parser QSTRING delmiter limitations

ILLES, Marton illes.marton at balabit.hu
Mon Apr 20 19:43:28 CEST 2009


On Sun, 2009-04-19 at 21:07 -0500, Martin Holste wrote:
> Marton,
> 
> Thanks!  I actually figured out the QSTRING and ESTRING thing
> yesterday while taking a closer look through the example patterndb.xml
> in the web location you previously referenced.  I hadn't yet figured
> out STRING, so your explanation was very helpful.  In any case, I
> think that your explanation will help a lot of other people in my
> situation and should serve as some makeshift documentation until it
> goes into the site proper.
> 
> Now that I've got my patterns matching and extracting properly, I need
> a way to mass produce patterns for all of the logs I want to parse.
> I'm working on a Perl script which receives a JSON object (submitted
> via CGI) which contains an example log string, a list of offsets
> within the string that should be extracted, and the macro names to
> assign those extracted strings.  It then parses that specification and
> does some simple PCRE matching to decide whether to use QSTRING, IPv4,
> or NUMBER for the extraction/static text anchoring and produces the
> pattern.  I've finished that part, now I'm working on the code to
> manage this all in a database.  My plan is to have a web console for
> allowing end users (who may not be regexp wizards) a place to paste in
> an example log, highlight the parts they want to extract, name them,
> and then submit them.  Then when the pattern is created, it will go
> into a database that can be used to auto-generate a patterndb.xml file
> as well as custom filters, templates, and destinations per pattern so
> that the parsed messages can be inserted directly into an indexed
> database.
> 
> So, here's a bit of a strange follow-up question: how will having one
> log destination per pattern affect overall Syslog-NG performance?  Is
> there some other way to use the custom templates which implement the
> custom macros from the db-pattern() parser?  From what I could tell,
> the only place you can put a template statement is in a destination
> statement.  So, in order to log the custom macros, for each pattern
> there must be a unique filter, template, and destination.  Is there
> some other strategy I could employ?  The only other one I could think
> of would be to list every possible custom macro in one log template
> and have a script use the ${.classification.rule_id} to know what it
> should take from Syslog-NG and put into the database.
> 

Martin,

I am glad the parser part is working for you. I am uploading the
description to my blog and I guess it would also be merged into our
documentation. The (semi-)automatic message->pattern converter seems an
interesting idea. If you could share it with the list I am sure many
people would be happy to use it.

Regarding your question on SQL integration. Currently all you can do is
utilize filters and separate destinations for SQL logging. You can also
use macros in sql values and table name definitions as well, but if you
have many different log messages that you want to log into different
table scheme than filters are the only current solution.

Depending on your sql scheme needs you can have one table to fit all of
your messages, just some columns wont be filled for each row. Also it
could work if you have similar messages with same fields. I think
something like this would work (I just show the relevant parts):

destination d_sql {
	sql(
		database("logs");
		table("message_${.classifier.rule_id}_${R_YEAR}_${R_MONTH}_${R_DAY}");
		columns("timestamp int", "host text", "program text", "msg text", "field1", "field2");
		values("${R_UNIXTIME}", "${HOST}", "${PROGRAM}", "${MSG}", "${field1}", "${field2}");
	);
};

log {
	source(s_src);
	parser(p_pdb);
	destination(d_sql);
};

This way you will have in separate columns field1, field2 etc, and
messages will be separate tables by day and rule_id. This has the
drawback that you will have seem field names and same number of fields.
Other option as you stated to use filters like this:

filter f_1 {
	match("RULE-ID"
	        value(".classifier.rule_id")
        	type("string")
        	flags("prefix")
	);
};

Of course having that number of log statements with filters have effect
on performance, so you probably need to minimize the number of different
filters and destinations. (Using the flag(final) statement in log
statements is probably good option from a performance perspective.)

We also see that this is probably not the best solution and some
automatic table scheme creation and filtering would be better. We are
planning to add tagging capability to syslog-ng so this way you can
attach any number of tags to messages. (Bazsi had a mail on that
earlier.) Anyhow the idea is that you can attach tags to rules and that
way matching messages would have tags. Based on the tags you will be
able to create table scheme and field name mapping that can be used in
different parts of syslog-ng. We can than create a special sql
destination that can work on tags and without any filters it would send
the messages with the parsed fields into an SQL destination.

So this is the plan so far, unfortunately I am bit buried with other
stuff to finish this, but this is something on my todo list. Meanwhile I
am happy to hear comments on that.

cheers,

Marton


> 
> Martin
> 
> On Sun, Apr 19, 2009 at 3:59 PM, ILLES, Marton
> <illes.marton at balabit.hu> wrote:
>         Hi,
>         
>         It is nice to hear that you are trying the db-parser. Let me
>         try to help
>         you with that, see my rather long answer down.
>         
>         
>         On Fri, 2009-04-17 at 16:11 -0500, Martin Holste wrote:
>         > Hi, I'm new to the list and syslog-ng in general.  I'm
>         building a
>         > centralized log collector and am very interested in the
>         power of the
>         > db-parser() parsing module.  It really has amazing
>         potential, and I'm
>         > eager to implement it.  I've been playing with it quite a
>         bit with a
>         > proof-of-concept to parse firewall logs from Cisco FWSM
>         blades.  The
>         > $MSGONLY part looks like this for a firewall deny:
>         >
>         > Deny udp src OUTSIDE:10.0.0.0/1234 dst
>         INSIDE:192.168.0.0/5678 by
>         > access-group "OUTSIDE" [0xb74026ad, 0x0]
>         >
>         > My working parser entry is thus:
>         >
>         > <patterndb version='1' pub_date='2009-04-17'>
>         >   <program name='FWSM'>
>         >     <pattern>%FWSM</pattern>
>         >     <rule id='1' class='security'>
>         >       <pattern>Deny at QSTRING:FIREWALL.DENY_PROTO:
>         @src</pattern>
>         >     </rule>
>         >   </program>
>         > </patterndb>
>         >
>         > This works great and returns udp and tcp in the
>         ${FIREWALL.DENY_PROTO}
>         > macro for logging, along with the ${.classifier.class} and
>         > ${.classifier.rule_id} macros.
>         >
>         > However, when I try to parse out the interface, IP, and port
>         numbers
>         > from "OUTSIDE:10.0.0.0/1234" part, the delimiters fail to
>         capture
>         > correctly and the whole pattern misses.  Here's what I'm
>         trying to do:
>         >
>         > <patterndb version='1' pub_date='2009-04-17'>
>         >   <program name='FWSM'>
>         >     <pattern>%FWSM</pattern>
>         >     <rule id='1' class='security'>
>         >       <pattern>Deny at QSTRING:FIREWALL.DENY_PROTO:
>         > @src at QSTRING:FIREWALL.DENY_O_INT: @:@IPv4
>         > $:FIREWALL.DENY_SRCIP:@/@NUMBER:FIREWALL.DENY_SRCPORT:
>         @dst</pattern>
>         >     </rule>
>         >   </program>
>         > </patterndb>
>         
>         > After much debugging, it appears that there is a problem
>         using QSTRING
>         > to match non-space-delimited parsing boundaries.  That is,
>         you cannot
>         > parse arbitrarily, you have to match on space boundaries.
>          Is this
>         > true, or am I doing something wrong?  I even tried to parse
>         the 'n'
>         > out of the word 'Deny' with a pattern like
>         <pattern>De at QSTRING:test:
>         > @y</pattern> and that fails.  From the debug, it appears
>         that unless
>         > there is a space present, the radix key is off by one:
>         >
>         > Looking up node in the radix tree; i='0', nodelen='0',
>         keylen='138',
>         > root_key='', key='Deny udp src<snip></snip>'
>         > Looking up node in the radix tree; i='2', nodelen='2',
>         keylen='138',
>         > root_key='De', key='Deny udp src<snip></snip>'
>         >
>         > It looks like the key for the second entry should be key='ny
>         udp
>         > src<snip></snip>' since the original 'De' match already
>         hit.  I put a
>         > lot of printf debugging statements in the code to see if I
>         could
>         > figure out what was going wrong, but I havent' been able to
>         conclude
>         > what the problem is yet, assuming arbitrary pattern
>         delimiting was the
>         > intended goal.  Is anyone able to successfully get
>         db-parser() to
>         > parse on arbitrary characters?
>         >
>         > Also, the source code refers to STRING and ESTRING, how are
>         those
>         > different from QSTRING?  It looked like ESTRING was probably
>         just an
>         > offset-based version of QSTRING.
>         
>         
>         
>         Short answer:
>         The problem is with your pattern, try this one instead:
>         
>         Deny at QSTRING:FIREWALL.DENY_PROTO:
>         @src at QSTRING:FIREWALL.DENY_O_INT: :@@IPv4
>         $:FIREWALL.DENY_SRCIP@/@NUMBER:FIREWALL.DENY_SRCPORT@ dst
>         
>         
>         
>         Long answer:
>         Let me explain the errors and how parsers operate. Basically
>         all parser
>         follow the same way how arguments can be specified, but there
>         are some
>         differences though. The most simple scenario where you only
>         specify the
>         parser type like this: @NUMBER@ which will parse and match a
>         number
>         without storing it in a variable or any other special
>         function.
>         
>         If you want to store the matched value into a variable, which
>         can be
>         referenced latter in a macro substitution you can specify a
>         name for the
>         parser like this: @NUMBER:mynumber@ the arguments of the
>         parser are
>         separated by a colon ":", but only the type argument is
>         mandatory the
>         others are optional. The first two argument is the same for
>         all parser
>         type, while the third one has different meaning for different
>         parsers.
>         
>         Using the third argument you can customize the parser on how
>         it should
>         parse/match. IPv4 and NUMBER parsers do not use the third
>         argument only
>         STRING, ESTRING and QSTRING are affected.
>         
>         
>         The most simple one is STRING which matches a given text
>         char-by-char
>         while it sees an aplhanumeric character. With the optional
>         third
>         argument additional (non-aplhanum) characters could be
>         specified.
>         
>         Given the following MSG:
>         "user=marton1234 group=admin"
>         
>         the "@STRING:mytext@" pattern would only match the string
>         "user" as the
>         = char is non alphanum. However the "@STRING:mytext:=@"
>         pattern would
>         match "user=marton1234" and would stop at the whitespace. To
>         match the
>         whole MSG with the parser one would need to use the following
>         pattern:
>         "@STRING:mytext:= @" as it would match aplhanum characters
>         plus the =
>         sign and the ' ' whitespace as well. Of course normally one
>         would use a
>         better pattern to match the "user" and "group" part
>         separately, like
>         this: "user=@STRING:user@ group=@STRING:group@"
>         
>         QSTRING and ESTRING parser take a bit different and usually a
>         faster
>         approach to the problem. Rather than checking each chars
>         one-by-one they
>         look for the delimiters. QSTRING stands for "quoted string" so
>         it would
>         match any text between quotation marks which must be specified
>         as the
>         third argument for QSTRING. By default only one character
>         needs to be
>         specified which will be used as start and end quotation mark
>         but it is
>         possible to specify separately the starting and the ending
>         marks.
>         
>         Now let's take the following MSG as an example:
>         from='Marton <marci at server>'
>         
>         Using the "from=@QSTRING:mytext:'@" pattern, the mytext
>         variable would
>         hold the "Marton <marci at server>@ text between the ' marks.
>         This case
>         only one char was specified and it was used as a starting and
>         ending
>         mark as well. However it is possible to specify two chars to
>         be used as
>         a starting/ending marks, like this: "from=@QSTRING:mytext:' @"
>         Now it
>         would match from the ' char to the space char, so mytext would
>         contain:
>         "Marton " only. A better example would be to match texts
>         between <>,
>         like this: "from='@STRING:name@ @QSTRING:addres:<>@'" where
>         name would
>         contain "Marton", while the address variable would contain
>         "marci at server".
>         
>         Using the QSTRING is faster than simple STRING, but it is not
>         always
>         possible to use it, specially when the first character is
>         unknown in
>         advance, and we want to specify only the last char. This case
>         the
>         ESTRING parser is handy which matches a text till an ending
>         mark.
>         
>         To match the variable part of the previous example one would
>         use the
>         following pattern: "from='@ESTRING:mytext:'@". Now we match
>         the first '
>         mark as a literal string and would match the remaining text
>         till the
>         second ' mark by ESTRING parser.
>         
>         Mind that the NUMBER and IPv4 parser only match a number
>         (continues
>         numeric characters) or an ipv4 address with doted notation.
>         You can not
>         specify other delimiters or such for these. (This was by the
>         way the
>         problem in your example pattern.)
>         
>         To match your other pattern: "De at QSTRING:test:y@" pattern you
>         would need
>         to use a MSG like this: "DeyANYTHINGy".
>         
>         I hope I could give you a better overview on how the parsers
>         operate.
>         Also feel free to drop me a mail if you have any further
>         problem.
>         
>         Also you can download from BalaBit website a patterndb for
>         Cisco PIX
>         messages and a patterndb converted (by a script and little
>         human
>         interaction) from the logcheck database.
>         
>         cheers,
>         
>         Marton
>         --
>         Key fingerprint = F78C 25CA 5F88 6FAF EA21 779D 3279 9F9E 1155
>         670D
>         
>         
>         ______________________________________________________________________________
>         Member info:
>         https://lists.balabit.hu/mailman/listinfo/syslog-ng
>         Documentation:
>         http://www.balabit.com/support/documentation/?product=syslog-ng
>         FAQ: http://www.campin.net/syslog-ng/faq.html
>         
> 
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
> 
-- 
Key fingerprint = F78C 25CA 5F88 6FAF EA21 779D 3279 9F9E 1155 670D




More information about the syslog-ng mailing list