RFC: groupingby parser

13 Nov 2015

      Hi,

I have just created a pull request for the groupingby() parser that I am
seeking feedback for:

https://github.com/balabit/syslog-ng/pull/785

The commit message is pretty detailed on what it does, but let me reproduce
it here for simplicity.

Any feedback is appreciated. Thanks.

dbparser: add groupingby() parser

This patch adds a new parser that can perform simple correllation on log
messages, e.g.  when multiple input log messages describe the same event.

In a way it is similar to the SQL GROUP BY operation, where an aggregate of
a set of input records can be calculated.

The major difference between SQL GROUP BY and groupingby() is that the first
_always_ operates on a enumerable list of records, whereas groupingby()
works on a stream of data.

groupingby() produces related groups by using a sliding window on time,
e.g.
it can be specified how much time we need to look back to group related
messages together.

As a specific use-case, let's see Linux audit logs. Linux audit logs tend to
be broken to several lines generated as a list of lines.  These tend to be
pretty close in time, however there might be multiple events logged at
around the same time, which get mixed up in the output.

The example below is the audit log for an ntpdate execution:

    type=SYSCALL msg=audit(1440927434.124:40347): arch=c000003e syscall=59
success=yes exit=0 a0=7f121cef0b88 a1=7f121cef0c00 a2=7f121e690d98 a3=2
items=2 ppid=4312 pid=4347 auid=4294967295 uid=0 gid=0 euid=0 suid=0
fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ntpdate"
exe="/usr/sbin/ntpdate" key=(null)
    type=EXECVE msg=audit(1440927434.124:40347): argc=3
a0="/usr/sbin/ntpdate" a1="-s" a2="ntp.ubuntu.com"
    type=CWD msg=audit(1440927434.124:40347):  cwd="/"
    type=PATH msg=audit(1440927434.124:40347): item=0
name="/usr/sbin/ntpdate" inode=2006003 dev=08:01 mode=0100755 ouid=0 ogid=0
rdev=00:00 nametype=NORMAL
    type=PATH msg=audit(1440927434.124:40347): item=1
name="/lib64/ld-linux-x86-64.so.2" inode=5243184 dev=08:01 mode=0100755
ouid=0 ogid=0 rdev=00:00 nametype=NORMAL
    type=PROCTITLE msg=audit(1440927434.124:40347):
proctitle=2F62696E2F7368002F7573722F7362696E2F6E7470646174652D64656269616E002D73

These lines are connected by their 2nd field, msg equals to
audit(1440927434.124:40347).

This can be processed by the groupingby() parser in a similar way that
db-parser() could do correllation.

These are the options for groupingby():

  * key(): specifies the key for the grouping, e.g. the value that must be
the
    same for all messages in the group

  * scope(): specifies one of three values: "global", "host", "process",
meaning
    the same as in db-parser; whether to apply grouping for all messages
    received by syslog-ng (global), only messages coming from the same host
    (host), or the same process/pid combination.

  * where(): specifies a filter condition, messages not matching the filter
    will NOT be added to the group.  where() only has access to a single
    message, the current one being processed.

  * having(): specifies a filter condition that must match in order for the
    group to generate an aggregate message. having() has access to the
    entire group through the "context".

  * timeout(): specifies the maximum time to wait for all messages in the
    group to arrive. After this time, the group is assumed to be complete
    and is aggregation is triggered.

  * aggregate(): this specifies the aggregate message that's going to be
    generated when the group is complete.
     - tags():
     - value():
     - inherit-mode():

  * inject-mode(): how the aggregate message is injected into the syslog-ng
    message routing, can be one of: "pass-through", "internal".

  * trigger(): trigger the closure of the group by matching an incoming
    message. If the filter condition specified here matches the incoming
    message, it will cause the aggregate message to be calculated, emitted
    and the group be discarded from the state table.

A few use-cases where this can be useful:
  * Linux audit logs
  * postfix logs

Signed-off-by: Balazs Scheidler <balazs.scheidler@balabit.com>

-- 
Bazsi

Scheidler, Balázs

tags

participants (1)