[syslog-ng] [RFC]: Pattern matching & corellation ideas

Peter Gyongyosi gyp at balabit.hu
Tue Sep 11 15:02:09 CEST 2012


Hi,

On 09/07/2012 08:26 PM, Gergely Nagy wrote:
> Peter Gyongyosi <gyp at balabit.hu> writes:
>
>> 1) the lisp-y syntax
> Yep, it is different, because of two factors: I like lisp, and I started
> coding the PoC in Clojure, and having a compatible syntax made the
> prototyping much much faster.
>
> But as I said in the RFC, I understand the syntax may not be easy for
> non-lispy folk, so the whole compiler business is being coded with this
> in mind: the parser is entirely separate from the rest. As long as
> there's a parser to translate the source to an intermediate format,
> we'll be fine, the rest of the toolchain will handle it.
>
> Right now, I have clojure macros that translate a DSL to an
> intermediate format, which gets further translated into a lower level
> representation (this is where the "(match)" stuff gets analyzed), which
> is then optimised, eliminating unused stuff, combining others, and so on
> and so forth, and in the end, the final step turns it into C.
>
> I also have a Lua and a Guile generator PoC'd up, so it is entirely
> possible to compile down to another, dynamic language, which can then be
> embedded in syslog-ng, and voila, no compiler is necessary!

I think that'd be great and it's a must.

>
> But I digress.
>
>> http://logstash.net/docs/1.1.1/filters/grok.
> Haven't looked at it in detail yet, but JSON has similar disadvantages:
> instead of parentheses, you'll have a ton of {} and [].
>
> Having had a second look at some of the recipes... eeep, no, thank
> you. It has the same feel as the current patterndb, except instead of an
> XML container, it's JSON. The fundamental problem still remains: it uses
> format-string-like syntax. That's the most horrible, inconvenient and
> inflexible thing ever invented.
>
> (Did I mention that I passionately hate format strings? Not just when
> they're used for parsing, but for formatting too.)

I think this is where our main differences come in: I do not hate format 
strings and I think they're quite readable and compact. But I have to 
agree that your updated syntax in the example below is easier to read.

>
>> I want to write patterns and not code or a huge XML. The actual
>> container format just needs to get out my way as much as possible.
> Yeah, understandable. While playing with the PoC, I came to the
> conclusion that the current language is too verbose. Thankfully, because
> it's all a bunch of clojure macros, I could build further macros to
> abstract away a bunch of things, and without *any* change to the code, I
> was able to rewrite this patterndb rule:
>
>        <rule provider='patterndb' id='4dd5a329-da83-4876-a431-ddcb59c2858c' class='system'>
>          <patterns>
>            <pattern>Accepted @ESTRING:usracct.authmethod: @for @ESTRING:usracct.username: @from @ESTRING:usracct.device: @port @ESTRING:: @@ANYSTRING:usracct.service@</pattern>
>          </patterns>
>          <values>
>            <value name='usracct.type'>login</value>
>            <value name='usracct.sessionid'>$PID</value>
>            <value name='usracct.application'>$PROGRAM</value>
>            <value name='secevt.verdict'>ACCEPT</value>
>          </values>
>          <tags>
>            <tag>usracct</tag>
>            <tag>secevt</tag>
>          </tags>
>        </rule>
>
> To this:
>
> (defruleset "4dd5a329-da83-4876-a431-ddcb59c2858c"
>    {:class :system
>     :provider :PoC}
>
>    (with-pattern "Accepted " (word :usracct.authmethod) " for "
>                  (word :usracct.username) " from "
>                  (word :usracct.device) " port "
>                  (string :usracct.service)
>      (do->
>        (set! :usracct.type "login"
>              :usracct.sessionid "$PID"
>              :usracct.application "$PROGRAM"
>              :secevt.verdict "ACCEPT")
>        (tag! :usracct :secevt))))
>
> There ain't that many parentheses anymore, and I think it's sufficiently
> clear even for those who don't speak a bit of lisp. Just read it as-is,
> and you'll pretty much know what the ruleset does.

OK, I'm convinced, I could live with such a syntax. If I were to design 
it, I'd create something JSON-y instead of Lisp-y, but I think it's just 
a matter of personal preference (and the fact that it's been about a 
decade since I've done anything with Lisp whereas I have to handle 
somethin JSON-like weekly). But that's just the two of us: what do 
others think?
>> 3) What about pattern hierarchy == efficient matching?
>>
>> Your proposal allows the user to define complex conditions for a
>> pattern match. On the other hand, the patterns we have right now work
>> in a way that allows us to organize them in a radix tree and use a
>> greedy, non-backtracking algorithm for matching which makes this
>> procedure incredibly fast.
> That's where the optimisation step comes in. In due time, I will be able
> to teach the optimiser to use a radix tree whenever possible, and only
> fall back when the complexity demands that.
>
>> Whereas if we'd allow more complex conditions, we'd need to fall back
>> to a linear matching: if we have 5000 patterns, we'd have to match
>> each and every pattern to each incoming message. Which is slow.
> Indeed. Which is why the language is limited enough to allow the
> optimiser to (reasonably easily) figure out what algorithm to
> use. I do not want to limit complexity because that makes it possible to
> write less efficient - or even horribly inefficient - parsers. Sometimes
> that is necessary, and I want to allow complex patterns too, while
> maintaining the ability to generate very fast code for the simple ones.
>
> As an example, it is entirely possible to translate simpler rulesets
> from my language to patterndb. If a ruleset can be translated to
> patterndb syntax, then the same algorithms can be used too. Perhaps I
> can even reuse the already existing code...
>
> Or, as an intermediate step in the PoC, I can teach my generator to emit
> patterndb rules instead of C, if what I wrote is expressable that
> way. :)

Oh, a haven't thought of that, although it is indeed doable. I like the 
idea of automatic optimization.


>> We still have to give an easy-to-use solution for users who simply
>> want to write patterns which they later use for filtering. The current
>> XML syntax is tidious to use, I agree, but what you suggest is, in my
>> opinion, even more so.
> Eee, we'll see. I don't really see that many people writing patterndb
> rules. I think I could count all of them in two hands, and one hand
> would be BalaBit employees.
>
> The way to make pattern writing easier, is not really the language
> itself (it does help if it is not cryptic; both grok and patterndb
> are. Compact, but cryptic), but the provided tools. Give people good
> tools, and they won't care the least bit about what language the tool
> produces as output.
>
> Which brings me to another benefit of using a Clojure-compatible syntax
> for the PoC: it's easy to manipulate from Clojure *AND* ClojureScript
> too. It wouldn't be too hard to knock up a little web app that presents
> you with a bunch of logs, and you can interactively develop patterns,
> without ever having to look at the code produced under the hood.
>
> Same could be done with Grok or PatternDB too, I suppose, but I'm not
> going to touch either from an application running in the browser.

Yes, you're absolutely right, although I don't really see what 
difference does the underlying format make -- but if it makes it more 
likely that you (or someone else) would come up with such a tool than 
it's a great plus by itself.

greets,
Peter




More information about the syslog-ng mailing list