[syslog-ng] [RFC]: Pattern matching & corellation ideas

Gergely Nagy algernon at balabit.hu
Tue Sep 11 17:28:41 CEST 2012


Peter Gyongyosi <gyp at balabit.hu> writes:

>>> http://logstash.net/docs/1.1.1/filters/grok.
>> Haven't looked at it in detail yet, but JSON has similar disadvantages:
>> instead of parentheses, you'll have a ton of {} and [].
>>
>> Having had a second look at some of the recipes... eeep, no, thank
>> you. It has the same feel as the current patterndb, except instead of an
>> XML container, it's JSON. The fundamental problem still remains: it uses
>> format-string-like syntax. That's the most horrible, inconvenient and
>> inflexible thing ever invented.
>>
>> (Did I mention that I passionately hate format strings? Not just when
>> they're used for parsing, but for formatting too.)
>
> I think this is where our main differences come in: I do not hate format 
> strings and I think they're quite readable and compact. But I have to 
> agree that your updated syntax in the example below is easier to read.

Well, format strings are fine and all up until a point. Once you try to
shovel all kinds of things into them, they start to get more and more
complex, and then it becomes a terrible choice.

As in, they're very fine for output, makes it easy to translate
strings. But when matching patterns... not so much.

Nevertheless, I suppose it's up to one's own preferences.

>> (defruleset "4dd5a329-da83-4876-a431-ddcb59c2858c"
>>    {:class :system
>>     :provider :PoC}
>>
>>    (with-pattern "Accepted " (word :usracct.authmethod) " for "
>>                  (word :usracct.username) " from "
>>                  (word :usracct.device) " port "
>>                  (string :usracct.service)
>>      (do->
>>        (set! :usracct.type "login"
>>              :usracct.sessionid "$PID"
>>              :usracct.application "$PROGRAM"
>>              :secevt.verdict "ACCEPT")
>>        (tag! :usracct :secevt))))
>>
>> There ain't that many parentheses anymore, and I think it's sufficiently
>> clear even for those who don't speak a bit of lisp. Just read it as-is,
>> and you'll pretty much know what the ruleset does.
>
> OK, I'm convinced, I could live with such a syntax. If I were to design 
> it, I'd create something JSON-y instead of Lisp-y, but I think it's just 
> a matter of personal preference (and the fact that it's been about a 
> decade since I've done anything with Lisp whereas I have to handle 
> somethin JSON-like weekly). But that's just the two of us: what do 
> others think?

Well, JSON-like isn't much different:

{"ruleset": {"id": "4dd5a329-da83-4876-a431-ddcb59c2858c",
             "class": "system",
             "provider": "PoC",
             "rules": [{"pattern": ["Accepted ",
                                   {"usracct.authmethod": "word"},
                                   " for ",
                                   {"usracct.username": "word"},
                                   " from ",
                                   {"usracct.device": "word"},
                                   " port ",
                                   {"usracct.service": "string"}],
                        "actions"  [{"set": {"usracct.type": "login",
                                            "usracct.sessionid": "$PID",
                                            "usracct.application": "$PROGRAM",
                                            "secevt.verdict": "ACCEPT"},
                                    "tag": ["usracct", "secevt"]}]
                       }]
            }
}

Or something along those lines...

Writing a parser that turns this into the very same AST is about ~15
minutes of work. (At the moment, the internal AST can be serialized to
and from JSON trivially, with about 3 lines of code, but the AST is far
more verbose)

Thing is, the input doesn't matter much. I like lisp-y, because I like
lisp, and the PoC is in Clojure, and that also gives me a lot more
power: I can use Clojure functions and macros, thereby reducing
copywaste within my rulesets, without having to extend the DSL itself.

But writing an input parser that turns patterndb, grok or whatever else
you can think of into our internal AST, just like we can output pretty
much anything that supports all the stuff described by the rulesets.

>> Or, as an intermediate step in the PoC, I can teach my generator to emit
>> patterndb rules instead of C, if what I wrote is expressable that
>> way. :)
>
> Oh, a haven't thought of that, although it is indeed doable. I like the 
> idea of automatic optimization.

FWIW, my PoC can generate patterndb rules, and soon enough, it will be
able to read them too.

C will be considerably harder, but I'm progressing with that too.

> Yes, you're absolutely right, although I don't really see what 
> difference does the underlying format make -- but if it makes it more 
> likely that you (or someone else) would come up with such a tool than 
> it's a great plus by itself.

If the format is easier to handle programmatically, then it's easier to
make a tool to fiddle with it, imo. Patterndb is - I believe - hard to
handle. It's not hard to generate from another format, mind you, but to
parse it, and interpret it... that's a tough one.

I plan to write a trivial interpreter along with my PoC, which will be
slow and inefficient, but enough to show how easy it is to work with the
format.

-- 
|8]



More information about the syslog-ng mailing list