Re: [syslog-ng] [patterndb] classification

3 Sep 2010

      On Fri, 2010-09-03 at 12:35 -0700, Anton Chuvakin wrote:
...
All,
...
As you probably know one goal for patterndb is to implement message
classification.
First, I worry when I hear about building a new taxonomy for log
messages from scratch when CEE (cee.mitre.org) is almost ready.
An arch spec just went out:
http://cee.mitre.org/docs/CEE_Architecture_Overview_May_2010.pdf
Last I've checked there was nothing concrete published from CEE. But
I'll definitely read it.

However quickly browsing through the PDF I couldn't find the taxonomy
portion, is this "almost ready" stuff available somewhere?
...
...
E.g. in addition to extracting information from log messages, it also
associates a "class", later available in the "${.classifier.class}"
value.
That is useful but one class likely won't cut it as a lot of messages
will be cross-class
...
violation     - security violation
security      - other security events
system        - system information
unknown       - no rule matches
Both 'system' and 'security' is a very common situation. User logins -
need I say more? :-)
And telling 'violation' from 'security' is probably a lost cause.
Yeah, I know that. This was coming from logcheck and until now I didn't
mean to improve it in any way.
...
...
One one hand, the tagging functionality (e.g. the ability to also
associate tags with each message) is superior to classes.
Absolutely, tag clouds would be a much better bet than a tree of categories.
...
But it is difficult to do with tags (except for using filters and
different destinations), as there's no such functionality. Another problem
is that tags/classes are completely independent, in order to filter on the
class of the message, one would have to use a match() filter like this:
Actually, that is a positive - especially when you include custom tags
, like regulatory relevance or relevance to  a particular unit inside
the organization.
...
My conclusion is that classes are better when used in templates,
tags are better when filtering. The two should be merged somehow.
Tags can be organized in 'bunches' that serve as classes.
You mean, every tag would belong to a bunch and a given message could
only be part of a single bunch?

Thus any single tag would indicate the bunch the message belongs to?

Or, I might be completely missing something.
...
...
3) drop the class stuff and implement a macro trick that
makes it possible to use tags in macro context
I'd avoid hard-coded classes altogether and go with all tags, possible
organized in "classes of tags" or bunches or whatever.

...
...
On an independent matter, the set of classes may need some thought. As
Ah, that's because it will fail - multi-mapping will kill it. This was
pretty much our starting point in CEE as many of us spent time doing
it at SIEM players. So, SIEM vendors have been trying to build HUGE
trees of events and ultimately they became unwieldy. Tags will be more
manageable and simple relationships can be established between them.
...
probably needs to be expanded. Last time I got patterns for
DNS queries, and although I could shove them into "system", right now I
feel that the point of classification is to categorize events by
Well, now multiply it by roughly 120,000 events types that leading
SIEM vendors categorized over the years and you'd know you don't want
that :-)
Right.
...
...
"importance", in a similar spirit to syslog severity, but one that works
even if the application developer uses a bogus severity when sending
syslog messages.
Important is HUGE challenge. Now sure what to add to this one as it is
largely an unsolved problem due to very different contexts for message
analysis. Even mere 'connection established' can be 10 of 10 for
somebody in some circumstances. One can try to glue important to tags
(like exploit > connection) and not to individual messages, it might
work sometimes.
Hmm... good idea.

-- 
Bazsi

Re: [syslog-ng] [patterndb] classification

Balazs Scheidler