[syslog-ng] Pattern Database first snapshot available

Fri Dec 18 17:44:48 CET 2009

On Fri, 2009-12-18 at 17:39 +0100, ILLES, Marton wrote:
> On Tue, 2009-12-15 at 13:00 -0600, Martin Holste wrote:
> > This is an awesome start, and I'm big into patterndb so this is really
> > encouraging.  Off the bat, I'd say that it would be more helpful if
> > the <values></values> tags were populated with the .dict values that
> > are being extracted so that you can construct output patterns
> > properly.
> 
> The <values></values> could be use to specify additional values which
> you want to set, but do not appear in the message itself. For example if
> you want to classify login messages, but for a certain message the
> username does not appear, but you know that this message reports a
> specific username. This case you can use the <values> to assign
> the .dict.username variable (for example) to that specific user and
> latter you can be sure that it exists.
> 
> I am still not sure if I completely understand your suggestion...
> 
> > Along with that, if you have a different name for every .dict value
> > extracted, it becomes labor-intensive to capture them in your output
> > template.  I prefer a method in which I have arbitrarily capped the
> > number of values to be extracted to be six strings, six integers.  I
> > then label the values I extract as s0-s5 and i0-i5.  That way I only
> > need one template for all patterns extracted.  Separating the strings
> > and integers makes database insertion easy because my tables then look
> > like <header columns> MSG, pattern_class_id, pattern_rule_id, i0 ..
> > i5, s0 .. s5.  Now searching for fields becomes possible if you know
> > what field belongs to what pattern rule ID.  I also prefer to have the
> > rule ID's as integers to keep my DB columns smaller.
> 
> The reason for using UUID was to have the ability to provide global
> unique ids, simple integers would be hard to maintain. I was also
> thinking using OIDs for IDs, but UUID was an easier choice. Technically
> you can use simple integers or any other string as syslog-ng currently
> does not check it. I will think about it... :)
> 
> Using integers would be also better because of DB indexing purposes. If
> you want to use integers, you can than assign a <value
> name="my_id">42</value> as a work-around to each pattern and latter use
> "my_id" in your templates.  
> 
> > Here's an example for a Cisco FWSM deny and NAT translation teardown
> > messages that I've been using:
> > 
> > <ruleset name="FWSM" id='2'>
> >                 <pattern>%FWSM</pattern>
> >                 <rules>
> >                         <rule provider="local" class='2' id='2'>
> >                                 <patterns>
> >                                         <pattern>Deny at QSTRING:i0:
> > @src at QSTRING:s0: :@@IPv4:i1:@/@NUMBER:i2:@ dst at QSTRING:s1:
> > :@@IPv4:i3:@/@NUMBER:i4:@ by access-group @QSTRING:s2:"@</pattern>
> >                                 </patterns>
> >                         </rule>
> >                         <rule provider="local" class='3' id='3'>
> >                                 <patterns>
> >                                         <pattern>Teardown at QSTRING:i0:
> > @connection @NUMBER::@ for at QSTRING:s0: :@@IPv4:i1:@/@NUMBER:i2:@
> > to at QSTRING:s1: :@@IPv4:i3:@/@NUMBER:i4:@ duration at QSTRING:s2: @bytes
> > @NUMBER:i5:@</pattern>
> >                                 </patterns>
> >                         </rule>
> >                 </rules>
> >         </ruleset>
> 
> I prefer using more meaning-full names as this way you can normalize
> your logs, so that it wont matter if it is a PIX, iptables etc. log
> message, you can always refer to the source/destination address with
> it's name. It requires to store different type of logs in different
> tables, but you can freely change your application without changing your
> log processing scripts.
> 
> You can also combine these to methods to use meaningful names in
> patterns and using <values> you can assign to numbered values, like
> this:
> 
> <value name="s1">${.dict.source_ip}</value>
> 
> Of course it would require a bit more memory and CPU cycles. Of course
> you are free to name your values as you want. I think it is really a
> question on the patterns we try to build and distribute. Maybe I can add
> a rewrite mechanism to pdbtool which would rename the pattern names to
> numbered value names. So this way we can publish patterns with
> meaningful names and anyone can latter rename the patterns for numbered
> names. Would it fit your needs?

I guess it'd be simpler to reuse the numbered "match" support in
syslog-ng, just what the regexps use. You can reference them using 
$1 .. $255 and it is quite simple to use them, I've almost created a
patch, but at the end I didn't.

With the new NVTable code, it could even use the same memory and store
only a reference:

log_msg_set_match_indirect(msg, index, ...)

-- 
Bazsi