Hi, On 09/07/2012 08:26 PM, Gergely Nagy wrote:
Peter Gyongyosi <gyp@balabit.hu> writes:
1) the lisp-y syntax Yep, it is different, because of two factors: I like lisp, and I started coding the PoC in Clojure, and having a compatible syntax made the prototyping much much faster.
But as I said in the RFC, I understand the syntax may not be easy for non-lispy folk, so the whole compiler business is being coded with this in mind: the parser is entirely separate from the rest. As long as there's a parser to translate the source to an intermediate format, we'll be fine, the rest of the toolchain will handle it.
Right now, I have clojure macros that translate a DSL to an intermediate format, which gets further translated into a lower level representation (this is where the "(match)" stuff gets analyzed), which is then optimised, eliminating unused stuff, combining others, and so on and so forth, and in the end, the final step turns it into C.
I also have a Lua and a Guile generator PoC'd up, so it is entirely possible to compile down to another, dynamic language, which can then be embedded in syslog-ng, and voila, no compiler is necessary!
I think that'd be great and it's a must.
But I digress.
http://logstash.net/docs/1.1.1/filters/grok. Haven't looked at it in detail yet, but JSON has similar disadvantages: instead of parentheses, you'll have a ton of {} and [].
Having had a second look at some of the recipes... eeep, no, thank you. It has the same feel as the current patterndb, except instead of an XML container, it's JSON. The fundamental problem still remains: it uses format-string-like syntax. That's the most horrible, inconvenient and inflexible thing ever invented.
(Did I mention that I passionately hate format strings? Not just when they're used for parsing, but for formatting too.)
I think this is where our main differences come in: I do not hate format strings and I think they're quite readable and compact. But I have to agree that your updated syntax in the example below is easier to read.
I want to write patterns and not code or a huge XML. The actual container format just needs to get out my way as much as possible. Yeah, understandable. While playing with the PoC, I came to the conclusion that the current language is too verbose. Thankfully, because it's all a bunch of clojure macros, I could build further macros to abstract away a bunch of things, and without *any* change to the code, I was able to rewrite this patterndb rule:
<rule provider='patterndb' id='4dd5a329-da83-4876-a431-ddcb59c2858c' class='system'> <patterns> <pattern>Accepted @ESTRING:usracct.authmethod: @for @ESTRING:usracct.username: @from @ESTRING:usracct.device: @port @ESTRING:: @@ANYSTRING:usracct.service@</pattern> </patterns> <values> <value name='usracct.type'>login</value> <value name='usracct.sessionid'>$PID</value> <value name='usracct.application'>$PROGRAM</value> <value name='secevt.verdict'>ACCEPT</value> </values> <tags> <tag>usracct</tag> <tag>secevt</tag> </tags> </rule>
To this:
(defruleset "4dd5a329-da83-4876-a431-ddcb59c2858c" {:class :system :provider :PoC}
(with-pattern "Accepted " (word :usracct.authmethod) " for " (word :usracct.username) " from " (word :usracct.device) " port " (string :usracct.service) (do-> (set! :usracct.type "login" :usracct.sessionid "$PID" :usracct.application "$PROGRAM" :secevt.verdict "ACCEPT") (tag! :usracct :secevt))))
There ain't that many parentheses anymore, and I think it's sufficiently clear even for those who don't speak a bit of lisp. Just read it as-is, and you'll pretty much know what the ruleset does.
OK, I'm convinced, I could live with such a syntax. If I were to design it, I'd create something JSON-y instead of Lisp-y, but I think it's just a matter of personal preference (and the fact that it's been about a decade since I've done anything with Lisp whereas I have to handle somethin JSON-like weekly). But that's just the two of us: what do others think?
3) What about pattern hierarchy == efficient matching?
Your proposal allows the user to define complex conditions for a pattern match. On the other hand, the patterns we have right now work in a way that allows us to organize them in a radix tree and use a greedy, non-backtracking algorithm for matching which makes this procedure incredibly fast. That's where the optimisation step comes in. In due time, I will be able to teach the optimiser to use a radix tree whenever possible, and only fall back when the complexity demands that.
Whereas if we'd allow more complex conditions, we'd need to fall back to a linear matching: if we have 5000 patterns, we'd have to match each and every pattern to each incoming message. Which is slow. Indeed. Which is why the language is limited enough to allow the optimiser to (reasonably easily) figure out what algorithm to use. I do not want to limit complexity because that makes it possible to write less efficient - or even horribly inefficient - parsers. Sometimes that is necessary, and I want to allow complex patterns too, while maintaining the ability to generate very fast code for the simple ones.
As an example, it is entirely possible to translate simpler rulesets from my language to patterndb. If a ruleset can be translated to patterndb syntax, then the same algorithms can be used too. Perhaps I can even reuse the already existing code...
Or, as an intermediate step in the PoC, I can teach my generator to emit patterndb rules instead of C, if what I wrote is expressable that way. :)
Oh, a haven't thought of that, although it is indeed doable. I like the idea of automatic optimization.
We still have to give an easy-to-use solution for users who simply want to write patterns which they later use for filtering. The current XML syntax is tidious to use, I agree, but what you suggest is, in my opinion, even more so. Eee, we'll see. I don't really see that many people writing patterndb rules. I think I could count all of them in two hands, and one hand would be BalaBit employees.
The way to make pattern writing easier, is not really the language itself (it does help if it is not cryptic; both grok and patterndb are. Compact, but cryptic), but the provided tools. Give people good tools, and they won't care the least bit about what language the tool produces as output.
Which brings me to another benefit of using a Clojure-compatible syntax for the PoC: it's easy to manipulate from Clojure *AND* ClojureScript too. It wouldn't be too hard to knock up a little web app that presents you with a bunch of logs, and you can interactively develop patterns, without ever having to look at the code produced under the hood.
Same could be done with Grok or PatternDB too, I suppose, but I'm not going to touch either from an application running in the browser.
Yes, you're absolutely right, although I don't really see what difference does the underlying format make -- but if it makes it more likely that you (or someone else) would come up with such a tool than it's a great plus by itself. greets, Peter