Peter Gyongyosi <gyp@balabit.hu> writes:
http://logstash.net/docs/1.1.1/filters/grok. Haven't looked at it in detail yet, but JSON has similar disadvantages: instead of parentheses, you'll have a ton of {} and [].
Having had a second look at some of the recipes... eeep, no, thank you. It has the same feel as the current patterndb, except instead of an XML container, it's JSON. The fundamental problem still remains: it uses format-string-like syntax. That's the most horrible, inconvenient and inflexible thing ever invented.
(Did I mention that I passionately hate format strings? Not just when they're used for parsing, but for formatting too.)
I think this is where our main differences come in: I do not hate format strings and I think they're quite readable and compact. But I have to agree that your updated syntax in the example below is easier to read.
Well, format strings are fine and all up until a point. Once you try to shovel all kinds of things into them, they start to get more and more complex, and then it becomes a terrible choice. As in, they're very fine for output, makes it easy to translate strings. But when matching patterns... not so much. Nevertheless, I suppose it's up to one's own preferences.
(defruleset "4dd5a329-da83-4876-a431-ddcb59c2858c" {:class :system :provider :PoC}
(with-pattern "Accepted " (word :usracct.authmethod) " for " (word :usracct.username) " from " (word :usracct.device) " port " (string :usracct.service) (do-> (set! :usracct.type "login" :usracct.sessionid "$PID" :usracct.application "$PROGRAM" :secevt.verdict "ACCEPT") (tag! :usracct :secevt))))
There ain't that many parentheses anymore, and I think it's sufficiently clear even for those who don't speak a bit of lisp. Just read it as-is, and you'll pretty much know what the ruleset does.
OK, I'm convinced, I could live with such a syntax. If I were to design it, I'd create something JSON-y instead of Lisp-y, but I think it's just a matter of personal preference (and the fact that it's been about a decade since I've done anything with Lisp whereas I have to handle somethin JSON-like weekly). But that's just the two of us: what do others think?
Well, JSON-like isn't much different: {"ruleset": {"id": "4dd5a329-da83-4876-a431-ddcb59c2858c", "class": "system", "provider": "PoC", "rules": [{"pattern": ["Accepted ", {"usracct.authmethod": "word"}, " for ", {"usracct.username": "word"}, " from ", {"usracct.device": "word"}, " port ", {"usracct.service": "string"}], "actions" [{"set": {"usracct.type": "login", "usracct.sessionid": "$PID", "usracct.application": "$PROGRAM", "secevt.verdict": "ACCEPT"}, "tag": ["usracct", "secevt"]}] }] } } Or something along those lines... Writing a parser that turns this into the very same AST is about ~15 minutes of work. (At the moment, the internal AST can be serialized to and from JSON trivially, with about 3 lines of code, but the AST is far more verbose) Thing is, the input doesn't matter much. I like lisp-y, because I like lisp, and the PoC is in Clojure, and that also gives me a lot more power: I can use Clojure functions and macros, thereby reducing copywaste within my rulesets, without having to extend the DSL itself. But writing an input parser that turns patterndb, grok or whatever else you can think of into our internal AST, just like we can output pretty much anything that supports all the stuff described by the rulesets.
Or, as an intermediate step in the PoC, I can teach my generator to emit patterndb rules instead of C, if what I wrote is expressable that way. :)
Oh, a haven't thought of that, although it is indeed doable. I like the idea of automatic optimization.
FWIW, my PoC can generate patterndb rules, and soon enough, it will be able to read them too. C will be considerably harder, but I'm progressing with that too.
Yes, you're absolutely right, although I don't really see what difference does the underlying format make -- but if it makes it more likely that you (or someone else) would come up with such a tool than it's a great plus by itself.
If the format is easier to handle programmatically, then it's easier to make a tool to fiddle with it, imo. Patterndb is - I believe - hard to handle. It's not hard to generate from another format, mind you, but to parse it, and interpret it... that's a tough one. I plan to write a trivial interpreter along with my PoC, which will be slow and inefficient, but enough to show how easy it is to work with the format. -- |8]