Hi, I have just created a pull request for the groupingby() parser that I am seeking feedback for: https://github.com/balabit/syslog-ng/pull/785 The commit message is pretty detailed on what it does, but let me reproduce it here for simplicity. Any feedback is appreciated. Thanks. dbparser: add groupingby() parser This patch adds a new parser that can perform simple correllation on log messages, e.g. when multiple input log messages describe the same event. In a way it is similar to the SQL GROUP BY operation, where an aggregate of a set of input records can be calculated. The major difference between SQL GROUP BY and groupingby() is that the first _always_ operates on a enumerable list of records, whereas groupingby() works on a stream of data. groupingby() produces related groups by using a sliding window on time, e.g. it can be specified how much time we need to look back to group related messages together. As a specific use-case, let's see Linux audit logs. Linux audit logs tend to be broken to several lines generated as a list of lines. These tend to be pretty close in time, however there might be multiple events logged at around the same time, which get mixed up in the output. The example below is the audit log for an ntpdate execution: type=SYSCALL msg=audit(1440927434.124:40347): arch=c000003e syscall=59 success=yes exit=0 a0=7f121cef0b88 a1=7f121cef0c00 a2=7f121e690d98 a3=2 items=2 ppid=4312 pid=4347 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ntpdate" exe="/usr/sbin/ntpdate" key=(null) type=EXECVE msg=audit(1440927434.124:40347): argc=3 a0="/usr/sbin/ntpdate" a1="-s" a2="ntp.ubuntu.com" type=CWD msg=audit(1440927434.124:40347): cwd="/" type=PATH msg=audit(1440927434.124:40347): item=0 name="/usr/sbin/ntpdate" inode=2006003 dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL type=PATH msg=audit(1440927434.124:40347): item=1 name="/lib64/ld-linux-x86-64.so.2" inode=5243184 dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL type=PROCTITLE msg=audit(1440927434.124:40347): proctitle=2F62696E2F7368002F7573722F7362696E2F6E7470646174652D64656269616E002D73 These lines are connected by their 2nd field, msg equals to audit(1440927434.124:40347). This can be processed by the groupingby() parser in a similar way that db-parser() could do correllation. These are the options for groupingby(): * key(): specifies the key for the grouping, e.g. the value that must be the same for all messages in the group * scope(): specifies one of three values: "global", "host", "process", meaning the same as in db-parser; whether to apply grouping for all messages received by syslog-ng (global), only messages coming from the same host (host), or the same process/pid combination. * where(): specifies a filter condition, messages not matching the filter will NOT be added to the group. where() only has access to a single message, the current one being processed. * having(): specifies a filter condition that must match in order for the group to generate an aggregate message. having() has access to the entire group through the "context". * timeout(): specifies the maximum time to wait for all messages in the group to arrive. After this time, the group is assumed to be complete and is aggregation is triggered. * aggregate(): this specifies the aggregate message that's going to be generated when the group is complete. - tags(): - value(): - inherit-mode(): * inject-mode(): how the aggregate message is injected into the syslog-ng message routing, can be one of: "pass-through", "internal". * trigger(): trigger the closure of the group by matching an incoming message. If the filter condition specified here matches the incoming message, it will cause the aggregate message to be calculated, emitted and the group be discarded from the state table. A few use-cases where this can be useful: * Linux audit logs * postfix logs Signed-off-by: Balazs Scheidler <balazs.scheidler@balabit.com> -- Bazsi
participants (1)
-
Scheidler, Balázs