[syslog-ng] [patterndb] classification

Fri Sep 3 22:03:28 CEST 2010

On Fri, 2010-09-03 at 12:35 -0700, Anton Chuvakin wrote:
> All,
> 
> > As you probably know one goal for patterndb is to implement message
> > classification.
> 
> First, I worry when I hear about building a new taxonomy for log
> messages from scratch when CEE (cee.mitre.org) is almost ready.
> An arch spec just went out:
> http://cee.mitre.org/docs/CEE_Architecture_Overview_May_2010.pdf

Last I've checked there was nothing concrete published from CEE. But
I'll definitely read it.

However quickly browsing through the PDF I couldn't find the taxonomy
portion, is this "almost ready" stuff available somewhere?

> 
> > E.g. in addition to extracting information from log messages, it also
> > associates a "class", later available in the "${.classifier.class}"
> > value.
> 
> That is useful but one class likely won't cut it as a lot of messages
> will be cross-class
> 
> > violation     - security violation
> > security      - other security events
> > system        - system information
> > unknown       - no rule matches
> 
> Both 'system' and 'security' is a very common situation. User logins -
> need I say more? :-)
> And telling 'violation' from 'security' is probably a lost cause.

Yeah, I know that. This was coming from logcheck and until now I didn't
mean to improve it in any way.

> 
> > One one hand, the tagging functionality (e.g. the ability to also
> > associate tags with each message) is superior to classes.
> 
> Absolutely, tag clouds would be a much better bet than a tree of categories.
> 
> > But it is difficult to do with tags (except for using filters and
> > different destinations), as there's no such functionality. Another problem
> > is that tags/classes are completely independent, in order to filter on the
> > class of the message, one would have to use a match() filter like this:
> 
> Actually, that is a positive - especially when you include custom tags
> , like regulatory relevance or relevance to  a particular unit inside
> the organization.
> 
> > My conclusion is that classes are better when used in templates,
> > tags are better when filtering. The two should be merged somehow.
> 
> Tags can be organized in 'bunches' that serve as classes.

You mean, every tag would belong to a bunch and a given message could
only be part of a single bunch?

Thus any single tag would indicate the bunch the message belongs to?

Or, I might be completely missing something. 

> 
> > 3) drop the class stuff and implement a macro trick that
> > makes it possible to use tags in macro context
> 
> I'd avoid hard-coded classes altogether and go with all tags, possible
> organized in "classes of tags" or bunches or whatever.

> 
> > On an independent matter, the set of classes may need some thought. As
> 
> Ah, that's because it will fail - multi-mapping will kill it. This was
> pretty much our starting point in CEE as many of us spent time doing
> it at SIEM players. So, SIEM vendors have been trying to build HUGE
> trees of events and ultimately they became unwieldy. Tags will be more
> manageable and simple relationships can be established between them.
> 
> > probably needs to be expanded. Last time I got patterns for
> > DNS queries, and although I could shove them into "system", right now I
> > feel that the point of classification is to categorize events by
> 
> Well, now multiply it by roughly 120,000 events types that leading
> SIEM vendors categorized over the years and you'd know you don't want
> that :-)

Right.

> 
> > "importance", in a similar spirit to syslog severity, but one that works
> > even if the application developer uses a bogus severity when sending
> > syslog messages.
> 
> Important is HUGE challenge. Now sure what to add to this one as it is
> largely an unsolved problem due to very different contexts for message
> analysis. Even mere 'connection established' can be 10 of 10 for
> somebody in some circumstances. One can try to glue important to tags
> (like exploit > connection) and not to individual messages, it might
> work sometimes.

Hmm... good idea.

-- 
Bazsi