On Fri, 2010-09-03 at 12:35 -0700, Anton Chuvakin wrote:
All,
As you probably know one goal for patterndb is to implement message classification.
First, I worry when I hear about building a new taxonomy for log messages from scratch when CEE (cee.mitre.org) is almost ready. An arch spec just went out: http://cee.mitre.org/docs/CEE_Architecture_Overview_May_2010.pdf
Last I've checked there was nothing concrete published from CEE. But I'll definitely read it. However quickly browsing through the PDF I couldn't find the taxonomy portion, is this "almost ready" stuff available somewhere?
E.g. in addition to extracting information from log messages, it also associates a "class", later available in the "${.classifier.class}" value.
That is useful but one class likely won't cut it as a lot of messages will be cross-class
violation - security violation security - other security events system - system information unknown - no rule matches
Both 'system' and 'security' is a very common situation. User logins - need I say more? :-) And telling 'violation' from 'security' is probably a lost cause.
Yeah, I know that. This was coming from logcheck and until now I didn't mean to improve it in any way.
One one hand, the tagging functionality (e.g. the ability to also associate tags with each message) is superior to classes.
Absolutely, tag clouds would be a much better bet than a tree of categories.
But it is difficult to do with tags (except for using filters and different destinations), as there's no such functionality. Another problem is that tags/classes are completely independent, in order to filter on the class of the message, one would have to use a match() filter like this:
Actually, that is a positive - especially when you include custom tags , like regulatory relevance or relevance to a particular unit inside the organization.
My conclusion is that classes are better when used in templates, tags are better when filtering. The two should be merged somehow.
Tags can be organized in 'bunches' that serve as classes.
You mean, every tag would belong to a bunch and a given message could only be part of a single bunch? Thus any single tag would indicate the bunch the message belongs to? Or, I might be completely missing something.
3) drop the class stuff and implement a macro trick that makes it possible to use tags in macro context
I'd avoid hard-coded classes altogether and go with all tags, possible organized in "classes of tags" or bunches or whatever.
On an independent matter, the set of classes may need some thought. As
Ah, that's because it will fail - multi-mapping will kill it. This was pretty much our starting point in CEE as many of us spent time doing it at SIEM players. So, SIEM vendors have been trying to build HUGE trees of events and ultimately they became unwieldy. Tags will be more manageable and simple relationships can be established between them.
probably needs to be expanded. Last time I got patterns for DNS queries, and although I could shove them into "system", right now I feel that the point of classification is to categorize events by
Well, now multiply it by roughly 120,000 events types that leading SIEM vendors categorized over the years and you'd know you don't want that :-)
Right.
"importance", in a similar spirit to syslog severity, but one that works even if the application developer uses a bogus severity when sending syslog messages.
Important is HUGE challenge. Now sure what to add to this one as it is largely an unsolved problem due to very different contexts for message analysis. Even mere 'connection established' can be 10 of 10 for somebody in some circumstances. One can try to glue important to tags (like exploit > connection) and not to individual messages, it might work sometimes.
Hmm... good idea. -- Bazsi