[syslog-ng] [patterndb] classification

Balazs Scheidler bazsi at balabit.hu
Mon Sep 6 10:48:06 CEST 2010


On Sat, 2010-09-04 at 20:40 -0500, Martin Holste wrote:
> > Multi-value N=V are evil. They kill log parsers and RDBMS :-) We did
> > think a lot about this conundrum of src_IP="10.10.1.2,10.10.1.3" and
> > might well recommend that it never happens. If we have to deaggregate
> > logs (thus exploding the volume) the whole thing would be a mess...
> 
> Yes, they are evil.  I was re-reading the recent thread "[syslog-ng]
> [announce] patterndb project," and I think we were in agreement that
> tags are still a good thing, though.  So, how do we store the
> multi-value N=V but also have the flexibility of tags?  My thought is
> maybe we go with a "primary" tag which is the class, and then the

What I'm thinking right now is to create the possibility to create a
"tagdb", independently from the patterndb database (although they must
play hand-in-hand).

This tagdb would define the tag hierarch (tags in bunches basically) and
could perhaps also associate type with the tags.

For example, Anton said that CEE is moving in the direction to provide
OAS (=object, action, status) tag triplets for each log message. This
type information could be represented with the hierarchy, or the "type"
field.

For example (representing tag types with a hierarchy):

<tagdb>
  <bunch name="object">
    <tag name="flowevt"/>
  </bunch>
  <bunch name="status">
  </bunch>
  <bunch name="action">
    <tag name="secevt"/>
  </bunch>
</tagdb>


For example (representing tag types explicitly):

<tagdb>
  <bunch name="security">
    <tag type="object" name="flowevt"/>
    <tag type="action" name="secevt"/>
  </bunch>
  <bunch name="storage">
    <tag type="object" name="file"/>
    <tag type="object" name="database"/>
  </bunch>
  <tag type="class" name="violation"/>
  <tag type="class" name="security"/>
  <tag type="class" name="system"/>
  <tag type="class" name="unknown"/>
  <tag name="just-a-simple-tag-without-type"/>
</tagdb>

The two are more-or-less equivalent if a single tag can belong to
multiple bunches, which I guess it can, the difference is that the
"type" property of the tag can be used easier by syslog-ng itself.

The behaviour of syslog-ng would be (typed tags):
  1) if a message is tagged with a tag type=="class", it'd
become .classifier.class
  2) patterndb could validate easily that each  message gets an
object/status/action tag

The behaviour of syslog-ng would be (hierarchy based tags):
  1) there would be builtin bunches that must exist
  2) based on the built-in bunches syslog-ng could enforce the same as
the typed bunches

For some reason I rather like type tags, even though it is somewhat more
bureaucratic: users/pattern authors should be free to create their tags
without limitation.

Opinions?

> <tags> can be output via macro $TAG.  ($TAG will contain all values in
> <tags>, right?) 

It is $TAGS and already exists in 3.1.2, it expands to a comma separated
list of tags without further escaping. (e.g. tags may not contain spaces
if your storage is a text file, or otherwise it makes it really
difficult to process files later).

>  So for the macro-based file name, you would only use
> file("/var/log/messages.${.classifier.class}.log") and do your tag
> grepping normally, where classifier.class would be the primary tag.  I
> think this would work out better in the long run than trying to
> concatenate tags for the class, because keeping track of the order
> would be complicated, and it would definitely be better than sticking
> to the logcheck's very limited range of class selections.


-- 
Bazsi



More information about the syslog-ng mailing list