[syslog-ng] [patterndb] classification

Fri Sep 3 21:35:42 CEST 2010

All,

> As you probably know one goal for patterndb is to implement message
> classification.

First, I worry when I hear about building a new taxonomy for log
messages from scratch when CEE (cee.mitre.org) is almost ready.
An arch spec just went out:
http://cee.mitre.org/docs/CEE_Architecture_Overview_May_2010.pdf

> E.g. in addition to extracting information from log messages, it also
> associates a "class", later available in the "${.classifier.class}"
> value.

That is useful but one class likely won't cut it as a lot of messages
will be cross-class

> violation     - security violation
> security      - other security events
> system        - system information
> unknown       - no rule matches

Both 'system' and 'security' is a very common situation. User logins -
need I say more? :-)
And telling 'violation' from 'security' is probably a lost cause.

> One one hand, the tagging functionality (e.g. the ability to also
> associate tags with each message) is superior to classes.

Absolutely, tag clouds would be a much better bet than a tree of categories.

> But it is difficult to do with tags (except for using filters and
> different destinations), as there's no such functionality. Another problem
> is that tags/classes are completely independent, in order to filter on the
> class of the message, one would have to use a match() filter like this:

Actually, that is a positive - especially when you include custom tags
, like regulatory relevance or relevance to  a particular unit inside
the organization.

> My conclusion is that classes are better when used in templates,
> tags are better when filtering. The two should be merged somehow.

Tags can be organized in 'bunches' that serve as classes.

> 3) drop the class stuff and implement a macro trick that
> makes it possible to use tags in macro context

I'd avoid hard-coded classes altogether and go with all tags, possible
organized in "classes of tags" or bunches or whatever.

> On an independent matter, the set of classes may need some thought. As

Ah, that's because it will fail - multi-mapping will kill it. This was
pretty much our starting point in CEE as many of us spent time doing
it at SIEM players. So, SIEM vendors have been trying to build HUGE
trees of events and ultimately they became unwieldy. Tags will be more
manageable and simple relationships can be established between them.

> probably needs to be expanded. Last time I got patterns for
> DNS queries, and although I could shove them into "system", right now I
> feel that the point of classification is to categorize events by

Well, now multiply it by roughly 120,000 events types that leading
SIEM vendors categorized over the years and you'd know you don't want
that :-)

> "importance", in a similar spirit to syslog severity, but one that works
> even if the application developer uses a bogus severity when sending
> syslog messages.

Important is HUGE challenge. Now sure what to add to this one as it is
largely an unsolved problem due to very different contexts for message
analysis. Even mere 'connection established' can be 10 of 10 for
somebody in some circumstances. One can try to glue important to tags
(like exploit > connection) and not to individual messages, it might
work sometimes.

Best,
--
Dr. Anton Chuvakin
Site: http://www.chuvakin.org
Blog: http://www.securitywarrior.org
LinkedIn: http://www.linkedin.com/in/chuvakin
Consulting: http://www.securitywarriorconsulting.com
Twitter: @anton_chuvakin
Google Voice: +1-510-771-7106