<div dir="ltr"><div><div><div>Hi,<br><br></div>I have just created a pull request for the groupingby() parser that I am seeking feedback for:<br><br><a href="https://github.com/balabit/syslog-ng/pull/785">https://github.com/balabit/syslog-ng/pull/785</a><br><br></div>The commit message is pretty detailed on what it does, but let me reproduce it here for simplicity.<br><br></div>Any feedback is appreciated. Thanks.<br><div><br>dbparser: add groupingby() parser<br><br>This patch adds a new parser that can perform simple correllation on log<br>messages, e.g. when multiple input log messages describe the same event.<br><br>In a way it is similar to the SQL GROUP BY operation, where an aggregate of<br>a set of input records can be calculated.<br><br>The major difference between SQL GROUP BY and groupingby() is that the first<br>_always_ operates on a enumerable list of records, whereas groupingby()<br>works on a stream of data.<br><br>groupingby() produces related groups by using a sliding window on time, e.g. <br>it can be specified how much time we need to look back to group related<br>messages together.<br><br>As a specific use-case, let's see Linux audit logs. Linux audit logs tend to<br>be broken to several lines generated as a list of lines. These tend to be<br>pretty close in time, however there might be multiple events logged at<br>around the same time, which get mixed up in the output.<br><br>The example below is the audit log for an ntpdate execution:<br><br> type=SYSCALL msg=audit(1440927434.124:40347): arch=c000003e syscall=59 success=yes exit=0 a0=7f121cef0b88 a1=7f121cef0c00 a2=7f121e690d98 a3=2 items=2 ppid=4312 pid=4347 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ntpdate" exe="/usr/sbin/ntpdate" key=(null)<br> type=EXECVE msg=audit(1440927434.124:40347): argc=3 a0="/usr/sbin/ntpdate" a1="-s" a2="<a href="http://ntp.ubuntu.com">ntp.ubuntu.com</a>"<br> type=CWD msg=audit(1440927434.124:40347): cwd="/"<br> type=PATH msg=audit(1440927434.124:40347): item=0 name="/usr/sbin/ntpdate" inode=2006003 dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL<br> type=PATH msg=audit(1440927434.124:40347): item=1 name="/lib64/ld-linux-x86-64.so.2" inode=5243184 dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL<br> type=PROCTITLE msg=audit(1440927434.124:40347): proctitle=2F62696E2F7368002F7573722F7362696E2F6E7470646174652D64656269616E002D73<br><br>These lines are connected by their 2nd field, msg equals to audit(1440927434.124:40347).<br><br>This can be processed by the groupingby() parser in a similar way that<br>db-parser() could do correllation.<br><br>These are the options for groupingby():<br><br> * key(): specifies the key for the grouping, e.g. the value that must be the <br> same for all messages in the group<br><br> * scope(): specifies one of three values: "global", "host", "process", meaning <br> the same as in db-parser; whether to apply grouping for all messages<br> received by syslog-ng (global), only messages coming from the same host<br> (host), or the same process/pid combination.<br><br> * where(): specifies a filter condition, messages not matching the filter<br> will NOT be added to the group. where() only has access to a single<br> message, the current one being processed.<br><br> * having(): specifies a filter condition that must match in order for the <br> group to generate an aggregate message. having() has access to the<br> entire group through the "context".<br><br> * timeout(): specifies the maximum time to wait for all messages in the<br> group to arrive. After this time, the group is assumed to be complete<br> and is aggregation is triggered.<br><br> * aggregate(): this specifies the aggregate message that's going to be<br> generated when the group is complete.<br> - tags():<br> - value():<br> - inherit-mode():<br><br> * inject-mode(): how the aggregate message is injected into the syslog-ng<br> message routing, can be one of: "pass-through", "internal".<br><br> * trigger(): trigger the closure of the group by matching an incoming<br> message. If the filter condition specified here matches the incoming<br> message, it will cause the aggregate message to be calculated, emitted<br> and the group be discarded from the state table.<br><br>A few use-cases where this can be useful:<br> * Linux audit logs<br> * postfix logs<br><br>Signed-off-by: Balazs Scheidler <<a href="mailto:balazs.scheidler@balabit.com">balazs.scheidler@balabit.com</a>><br><br><div><div><br clear="all"><div><div><div class="gmail_signature"><div dir="ltr">-- <br>Bazsi<br></div></div></div>
</div></div></div></div></div>