[syslog-ng] patterndb - user defined parsers

Sun Dec 18 09:40:17 CET 2011

On Sat, 2011-11-26 at 22:27 -0800, Evan Rempel wrote:
> It would be useful to permit users to define parsers in the patterndb.
> For example, in our environment, by policy we user a special set and order of characters of our
> administrators log into hosts and administer them. It would be useful to define a parser of
> 
> @SYSADMIN@ that would match only our sysadmin accounts.
> We could then use this parser in the patterndb to take some action such as sending
> a message to the administrators about the event.
> 
> Another example would be to create parser for @LOCALIP@ that matches my organizaions IP space.
> That way a set of rules can be defined using @LOCALIP@ for some kind of alerting, and then any
> organization could redifine the @LOCALIP@ and use all of the goodness that some third party had created for
> monitoring logs like an intrusion protection system.
> 
> Current parsers can be described as
> 
> QSTRING 
>   - match opening char
>  - while not closing char, keep looking
> 
> ESTRING
>   - while not end string, keep looking
> 
> NUMBER
>   - while digit keep looking
> 
> So it seems that general parsers could be constructed  with two styles of matching, and
> then concatenating the together.
> 
> 1. While in set of characters [some list of characters]
> 2. While not in set of characters [some list of characters]
> 
> I would call these
> INSET to match 1 or more of a set of characters, unless a #-# were specified, then a minimum to a maximum would be required.
> OUTSET to match 1 or more of anything except the characters, unless a #-# were specified, then a minimum to a maximum would be required.
> (perhaps a count of + or * could be used to specify 1 or more and 0 or more respectively)
> 
> and then limit the count of such occurrences so that you could build the @IPv4@ parser as
> 
> @INSET::123456789*1@@INSET::0123456789:0-2 at .@INSET::123456789:1@@INSET::0123456789:0-2 at .@INSET::123456789:1@@INSET::0123456789:0-2 at .@INSET::123456789:1@@INSET::0123456789:0-2@
> 
> and @NUMBER@ would be
> @INSET::123456789:1@@INSET::0123456789@
> 
> @FLOAT@ would be
> @INSET::0123456789.@
> 
> Then a user could make
> <parser name="THOUSAND">@INSET::,:0-1@@INSET::0123456789:3@</parser>
> <parser name="MONEY">$@INSET::123456789:1-3@@THOUSAND:::*@. at INSET::0123456789:2@
> 
> This is kind of like inventing regular expressions :-(
> 
> I'm not sure how well this fits into the radix tree matching structure, but I wanted to start this discussion.
> 
> Given the MONEY example, I think it is obvious that there needs to be a way to specify repeating groups of "something"

Some kind of parser definition would make perfect sense. There are some
technical problems to be resolved first though.

Right now, conflicts between rules are not resolved very well. If two
rules conflict on a parser (e.g. their prefix is the same and then two
different parsers are used at the same location), then db-parser()
evaluates them in order, and the first one wins. Then if an upcoming
parser doesn't match, no backtracking is done.

This should be resolved before adding a lot of different and perhaps
user defined parsers.

Also, instead of reinventing the wheel, I'd simply add a @REGEXP@
parser, which if hit could of course become a petformance bottleneck,
but stuffing all arguments into a @@ expression is difficult to read and
maintain.

-- 
Bazsi