Bazsi's blog: Syslog-ng correllation
Syslog-ng correllation I think we’ve reached an important milestone with syslog-ng: log message correllation was added to db-parser(). As you probably know dbparser and its sister project patterndb is able to transform unstructured syslog messages into a normalized format: the human readable string content becomes a set of name-value pairs. The problem is that in a lot of cases messages miss one or two details that would really be needed to understand them and this information usually comes in a followup message. For example: one message in postfix logs contain the sender address and while the recipient information comes in the next message. It is trivial to understand that in reality most cases you want the information in sender,recipient pairs. Another example is sshd, where the authentication failure comes in one and the exact reason for the failure comes in the next. Currently what you can do with syslog-ng is to put the separate messages into two SQL tables and join them at query time. This gets ugly quite fast: increased storage needs, the hassle with managing two tables instead of one and not to mention the increase of the time needed to query the database. Sometimes the sole reason for creating SQL tables in this case is to perform the correllation, otherwise you’d be happier with a CSV file. And that’s what became possible now with the latest git commit of syslog-ng 3.2. The idea is simple: when a patterndb rule matches, you can tell syslog-ng to remember that message by adding it to a correllation state. This state is identified with information extracted from the message making it a unique session identifier. When the next line comes in you can reference the information stored earlier. Basically the correllation state is a list of log messages associated with a session id. To add a new message to this state, you need a store rule: <rule id=”…”> <patterns> <pattern>foo session: @STRING:sessionid@, param: @STRING:param@</pattern> </patterns> <store id=”$sessionid” timeout=”60″/> </rule> The id attribute of the store element specifies a template containing any syslog-ng name-value pairs, probably extracted from the current message itself. When the final information comes in you can use the join attribute of the values tag: <rule id=”…”> <patterns> <pattern>bar session: @STRING:sessionid@</pattern> </patterns> <values join=”$sessionid”> <value name=”param”>${param}@1</value> </values> </rule> here the join attribute specifies the session to look up (which must match in the two messages), and if there’s a match all messages stored in the correllation state becomes available when evaluating the name-value pairs associated with the current message. The key here is the new syntax in the template string “@1″ appended to a name-value pair reference. After the “@” character, you can reference a message in the correllation state by specifying the index backward from the current message. This way @0 is the current message, @1 is the one prior to the current one, @2 is before that and so on. There are more complex ways to use/query the contents of the correllation state, but those will appear in a followup post. Stay tuned!
This is powerful stuff! I'm really looking forward to exploring all of the ways it can be used. The Postfix usage example is a great one. One clarification: is $sessionid autogenerated by Syslog-NG, or do we have to create it ourselves using tuples from the messages? I'm assuming it'll use something akin to the $SEQNUM macro. One other question, can you join an entire preceding message by using something like ${MSG}@0? If so, it would be great if there were a built-in to say all preceding messages, like ${MSG}@-1 or @ALL or something, but I guess that's getting a little more app-level than I'd prefer. On Wed, Sep 29, 2010 at 3:26 AM, Balazs Scheidler <bazsi@balabit.hu> wrote:
Syslog-ng correllation
I think we’ve reached an important milestone with syslog-ng: log message correllation was added to db-parser(). As you probably know dbparser and its sister project patterndb is able to transform unstructured syslog messages into a normalized format: the human readable string content becomes a set of name-value pairs. The problem is that in a lot of cases messages miss one or two details that would really be needed to understand them and this information usually comes in a followup message.
For example: one message in postfix logs contain the sender address and while the recipient information comes in the next message. It is trivial to understand that in reality most cases you want the information in sender,recipient pairs. Another example is sshd, where the authentication failure comes in one and the exact reason for the failure comes in the next.
Currently what you can do with syslog-ng is to put the separate messages into two SQL tables and join them at query time. This gets ugly quite fast: increased storage needs, the hassle with managing two tables instead of one and not to mention the increase of the time needed to query the database. Sometimes the sole reason for creating SQL tables in this case is to perform the correllation, otherwise you’d be happier with a CSV file.
And that’s what became possible now with the latest git commit of syslog-ng 3.2. The idea is simple: when a patterndb rule matches, you can tell syslog-ng to remember that message by adding it to a correllation state. This state is identified with information extracted from the message making it a unique session identifier. When the next line comes in you can reference the information stored earlier.
Basically the correllation state is a list of log messages associated with a session id. To add a new message to this state, you need a store rule:
<rule id=”…”> <patterns> <pattern>foo session: @STRING:sessionid@, param: @STRING:param@</pattern> </patterns> <store id=”$sessionid” timeout=”60″/> </rule>
The id attribute of the store element specifies a template containing any syslog-ng name-value pairs, probably extracted from the current message itself.
When the final information comes in you can use the join attribute of the values tag:
<rule id=”…”> <patterns> <pattern>bar session: @STRING:sessionid@</pattern> </patterns> <values join=”$sessionid”> <value name=”param”>${param}@1</value> </values> </rule>
here the join attribute specifies the session to look up (which must match in the two messages), and if there’s a match all messages stored in the correllation state becomes available when evaluating the name-value pairs associated with the current message.
The key here is the new syntax in the template string “@1″ appended to a name-value pair reference. After the “@” character, you can reference a message in the correllation state by specifying the index backward from the current message. This way @0 is the current message, @1 is the one prior to the current one, @2 is before that and so on.
There are more complex ways to use/query the contents of the correllation state, but those will appear in a followup post. Stay tuned!
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Wed, 2010-09-29 at 11:16 -0500, Martin Holste wrote:
This is powerful stuff! I'm really looking forward to exploring all of the ways it can be used. The Postfix usage example is a great one. One clarification: is $sessionid autogenerated by Syslog-NG, or do we have to create it ourselves using tuples from the messages? I'm assuming it'll use something akin to the $SEQNUM macro.
you have to generate it. in the example above I've parsed a value out of the log message as $sessionid, but you can use more complex values like: $HOST:$PID is usually good, but in other cases the log messaeg contains an explicit session id (postfix messageid for example).
One other question, can you join an entire preceding message by using something like ${MSG}@0?
@0 is the current message, but template functions get a chance to do with the complete correlation state, thus $(grep) is iterating over all messages. We do have $(echo) but that's intentionally as simple as possible, it only uses the last message. hmm.. something like $(grep ("a" == "a") $MSG) would do the trick, but may not be the the most intuitive to write. ahh. I've just noted that I didn't blog about $(grep) and $(if), but they certainly do what they imply: $(grep filter template1 template2 template3...) searches for messages in the current correllation state matched by filter expression "filter" and evaluating the templates. $(if filter foo bar) If filter is true results in foo, otherwise bar. Filters got extended too, now you can use simple comparison operators a'la perl: numeric comparison is the same as C (<, <=, ==, >=, >, !=), string comparison is the same as perl: lt, le, eq, ge, gt, ne You can compare templates, e.g. "$FACILITY_NUM" > "5"
If so, it would be great if there were a built-in to say all preceding messages, like ${MSG}@-1 or @ALL or something, but I guess that's getting a little more app-level than I'd prefer.
it depends if you want all name-value pairs, or just a single name-value pair. we just have to come up with names for the various functions. they are all possible and simple to do. -- Bazsi
you have to generate it. in the example above I've parsed a value out of the log message as $sessionid, but you can use more complex values like:
Ok, got it.
ahh. I've just noted that I didn't blog about $(grep) and $(if), but they certainly do what they imply:
$(grep filter template1 template2 template3...)
searches for messages in the current correllation state matched by filter expression "filter" and evaluating the templates.
$(if filter foo bar)
Cool!
If filter is true results in foo, otherwise bar.
So, an example statement might be: $(if ${useracct}1 == "?" ${useracct}="unknown" ${useracct}=${useracct} Is that right? I guess I'm not understanding foo and bar in your example.
Filters got extended too, now you can use simple comparison operators a'la perl: numeric comparison is the same as C (<, <=, ==, >=, >, !=), string comparison is the same as perl: lt, le, eq, ge, gt, ne
You can compare templates, e.g. "$FACILITY_NUM" > "5"
Coupled with the inet_aton functionality you've already added, this would mean you could do filtering based on IP ranges, right?
On Thu, 2010-09-30 at 14:19 -0500, Martin Holste wrote:
you have to generate it. in the example above I've parsed a value out of the log message as $sessionid, but you can use more complex values like:
Ok, got it.
ahh. I've just noted that I didn't blog about $(grep) and $(if), but they certainly do what they imply:
$(grep filter template1 template2 template3...)
searches for messages in the current correllation state matched by filter expression "filter" and evaluating the templates.
$(if filter foo bar)
Cool!
Thanks. I take this as a compliment. :) In fact I do like template functions a lot. If only I had a scripting engine embedded into syslog-ng to make it extending really easy. But anyway, writing a template function in C is as easy as possible.
If filter is true results in foo, otherwise bar.
So, an example statement might be:
$(if ${useracct}1 == "?" ${useracct}="unknown" ${useracct}=${useracct}
Is that right? I guess I'm not understanding foo and bar in your example.
the foo and bar parts are what the $(if) constructs expands to if the result of the filter evaluation is true / false respectively. I'm afraid but I can't understand your example.
Filters got extended too, now you can use simple comparison operators a'la perl: numeric comparison is the same as C (<, <=, ==, >=, >, !=), string comparison is the same as perl: lt, le, eq, ge, gt, ne
You can compare templates, e.g. "$FACILITY_NUM" > "5"
Coupled with the inet_aton functionality you've already added, this would mean you could do filtering based on IP ranges, right?
Yes. -- Bazsi
Thanks. I take this as a compliment. :) In fact I do like template functions a lot. If only I had a scripting engine embedded into syslog-ng to make it extending really easy.
My vote would be for embedding a Perl interpreter, though Lua seems to be the more fashionable embed these days.
the foo and bar parts are what the $(if) constructs expands to if the result of the filter evaluation is true / false respectively.
Can you give an example? I'm not on the same page with you.
On Wed, 2010-10-06 at 09:38 -0500, Martin Holste wrote:
Thanks. I take this as a compliment. :) In fact I do like template functions a lot. If only I had a scripting engine embedded into syslog-ng to make it extending really easy.
My vote would be for embedding a Perl interpreter, though Lua seems to be the more fashionable embed these days.
the foo and bar parts are what the $(if) constructs expands to if the result of the filter evaluation is true / false respectively.
Can you give an example? I'm not on the same page with you.
Let's say you want to assign the class of a given message based on whether the username is root or something else. <value name=".classifier.class">$(if "${usracct.username}" == "root" violation system)</value> -- Bazsi
Ok, got it. Now what about applying to other variables like this: <value name="usracct.username">$(if "${usracct.username}" == "root" "root" "normal user")</value> Or additional embedded conditionals (MySQL-style) like this: <value name="usracct.username">$(if "${usracct.username}" == "root" $(if "${usracct.username}" == "joe" "admin" "normal user") "normal user")</value> On Sat, Oct 16, 2010 at 5:23 AM, Balazs Scheidler <bazsi@balabit.hu> wrote:
On Wed, 2010-10-06 at 09:38 -0500, Martin Holste wrote:
Thanks. I take this as a compliment. :) In fact I do like template functions a lot. If only I had a scripting engine embedded into syslog-ng to make it extending really easy.
My vote would be for embedding a Perl interpreter, though Lua seems to be the more fashionable embed these days.
the foo and bar parts are what the $(if) constructs expands to if the result of the filter evaluation is true / false respectively.
Can you give an example? I'm not on the same page with you.
Let's say you want to assign the class of a given message based on whether the username is root or something else.
<value name=".classifier.class">$(if "${usracct.username}" == "root" violation system)</value>
-- Bazsi
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Sat, 2010-10-16 at 09:24 -0500, Martin Holste wrote:
Ok, got it. Now what about applying to other variables like this:
<value name="usracct.username">$(if "${usracct.username}" == "root" "root" "normal user")</value>
Or additional embedded conditionals (MySQL-style) like this:
<value name="usracct.username">$(if "${usracct.username}" == "root" $(if "${usracct.username}" == "joe" "admin" "normal user") "normal user")</value>
this is also possible of course. -- Bazsi
participants (2)
-
Balazs Scheidler
-
Martin Holste