RFC: value-pairs key rewrite framework, part N+1
Hi! After my former mails (see the thread starting at http://thread.gmane.org/gmane.comp.syslog-ng/11355/focus=11421), I'd like to do a recap, and ask for comments, as I came up with a few new ideas. The purpose of the value-pairs key rewrite framework is to make it possible to apply various transformations to the keys we selected with VP. So that we can add or remove prefixes, replace parts of the string, and so on. As of this writing, code to do this exists on one of my branches, but it hasn't been touched for a while, and I planned to update it in the near future. And that's when it dawned on me, that perhaps the syntax isn't all that great. To show why I think that, let's see an example first: value-pairs( scope("everything") rekey( add-prefix(".secevt" "events") add-prefix(".classifier" "syslog-ng") shift(".sdata.*" 1) replace("." "_") ) ); This will add an "events" prefix to each key that starts with ".secevt", so that ".secevt.verdict" becomes "events.secevt.verdict"; similary, ".classifier.class" becomes "syslog-ng.classifier.class"; keys that match '.sdata.*' get shifted to the right, removing the dot. And all remaining dots at the begininng of a key will get replaced by an underscore instead. This kinda makes sense, and I could even massage the syntax into format-json: $(format-json --scope everything --rekey --add-prefix .secevt=events --addprefix .classifier=syslog-ng --shift .sdata.*=1 --replace .=_ --end-rekey) However, this syntax has the downside of transformations being global: I can't choose subset of my keys, and apply a list of transformations on those and only those. Once I made a transformation, any transformations in the list afterwards will see the transformed key. So I can't easily say: "take all the keys starting with '.sdata.', shift them 6 chars, then replace any key names that start with 'win' with 'lose', and finally prefix them with 'whatever'". With the current syntax, that's next to impossible to do sanely. It's also not all that intuitive.. So I came up with a different syntax: wiring rekey into the key() option of value-pairs! That way, we already selected a subset to work on, and the transformations would apply to only those. (This could be combined with the global syntax aswell, though) So it'd look something like this: value-pairs( scope("everything") key(".secevt.*" rekey(add-prefix("events"))) key(".classifier.*" rekey(add-prefix("syslog-ng"))) key(".sdata.*" rekey(shift(1))) key(".*" rekey(replace ("." "_"))) ); This would achieve the exact same effect as the example above, but with a clearer syntax, perhaps. It would also mean that key rewriting can be described at the same place where the key is selected to begin with. The downside of this is that it'd be a bit harder to come up with a syntax for format-json that mimics the config file syntax. So, instead of trying to do that and end up with something horrible, I have another proposal: lets make value-pairs a top-level citizen, so that it joins the ranks of filter{} and rewrite{} and the like! That way, we could turn the following ugly thing: destination d_structured { mongodb( value-pairs( scope("everything") key(".secevt.*" rekey(add-prefix("events"))) key(".classifier.*" rekey(add-prefix("syslog-ng"))) key(".sdata.*" rekey(shift(1))) key(".*" rekey(replace ("." "_"))) ) ); file("/var/log/structured.json" template("$(format-json <repeat the above stuff, but with format-json syntax>)\n")); }; Into this beauty: valuepairs vp_example { scope(everything); key(".secevt.*" rekey(add-prefix("events"))); key(".classifier.*" rekey(add-prefix("syslog-ng"))); key(".sdata.*" rekey(shift(1))); key(".*" rekey(replace ("." "_"))); }; destination d_structured { mongodb(value-pairs(vp_example)); file("/var/log/structured.json" template("$(format-json --with-config vp_example)\n")); }; This would have the nice consequence of not having to keep two parsers in sync: format-json would only have a --with-config option, and nothing else. So, in the end, this whole boils down to two questions: * What do you think about allowing (or even moving) rekey() inside key() options? * What do you think about introducing a top-level valuepairs element, and dropping the format-json argument parsing stuff? -- |8]
On Tue, 2011-10-04 at 10:56 +0200, Gergely Nagy wrote:
Hi!
After my former mails (see the thread starting at http://thread.gmane.org/gmane.comp.syslog-ng/11355/focus=11421), I'd like to do a recap, and ask for comments, as I came up with a few new ideas.
The purpose of the value-pairs key rewrite framework is to make it possible to apply various transformations to the keys we selected with VP. So that we can add or remove prefixes, replace parts of the string, and so on.
As of this writing, code to do this exists on one of my branches, but it hasn't been touched for a while, and I planned to update it in the near future. And that's when it dawned on me, that perhaps the syntax isn't all that great.
To show why I think that, let's see an example first:
value-pairs( scope("everything") rekey( add-prefix(".secevt" "events") add-prefix(".classifier" "syslog-ng") shift(".sdata.*" 1) replace("." "_") ) );
This will add an "events" prefix to each key that starts with ".secevt", so that ".secevt.verdict" becomes "events.secevt.verdict"; similary, ".classifier.class" becomes "syslog-ng.classifier.class"; keys that match '.sdata.*' get shifted to the right, removing the dot. And all remaining dots at the begininng of a key will get replaced by an underscore instead.
This kinda makes sense, and I could even massage the syntax into format-json: $(format-json --scope everything --rekey --add-prefix .secevt=events --addprefix .classifier=syslog-ng --shift .sdata.*=1 --replace .=_ --end-rekey)
However, this syntax has the downside of transformations being global: I can't choose subset of my keys, and apply a list of transformations on those and only those. Once I made a transformation, any transformations in the list afterwards will see the transformed key. So I can't easily say: "take all the keys starting with '.sdata.', shift them 6 chars, then replace any key names that start with 'win' with 'lose', and finally prefix them with 'whatever'". With the current syntax, that's next to impossible to do sanely.
It's also not all that intuitive..
So I came up with a different syntax: wiring rekey into the key() option of value-pairs! That way, we already selected a subset to work on, and the transformations would apply to only those.
(This could be combined with the global syntax aswell, though)
So it'd look something like this:
value-pairs( scope("everything") key(".secevt.*" rekey(add-prefix("events"))) key(".classifier.*" rekey(add-prefix("syslog-ng"))) key(".sdata.*" rekey(shift(1))) key(".*" rekey(replace ("." "_"))) );
This would achieve the exact same effect as the example above, but with a clearer syntax, perhaps. It would also mean that key rewriting can be described at the same place where the key is selected to begin with.
The downside of this is that it'd be a bit harder to come up with a syntax for format-json that mimics the config file syntax.
So, instead of trying to do that and end up with something horrible, I have another proposal: lets make value-pairs a top-level citizen, so that it joins the ranks of filter{} and rewrite{} and the like!
That way, we could turn the following ugly thing:
destination d_structured { mongodb( value-pairs( scope("everything") key(".secevt.*" rekey(add-prefix("events"))) key(".classifier.*" rekey(add-prefix("syslog-ng"))) key(".sdata.*" rekey(shift(1))) key(".*" rekey(replace ("." "_"))) ) ); file("/var/log/structured.json" template("$(format-json <repeat the above stuff, but with format-json syntax>)\n")); };
Into this beauty:
valuepairs vp_example { scope(everything); key(".secevt.*" rekey(add-prefix("events"))); key(".classifier.*" rekey(add-prefix("syslog-ng"))); key(".sdata.*" rekey(shift(1))); key(".*" rekey(replace ("." "_"))); };
destination d_structured { mongodb(value-pairs(vp_example)); file("/var/log/structured.json" template("$(format-json --with-config vp_example)\n")); };
This would have the nice consequence of not having to keep two parsers in sync: format-json would only have a --with-config option, and nothing else.
So, in the end, this whole boils down to two questions:
* What do you think about allowing (or even moving) rekey() inside key() options? * What do you think about introducing a top-level valuepairs element, and dropping the format-json argument parsing stuff?
We've discussed this IRL and came to the conclusion that it is very handy to allow key-rewrite to be applied on a per-glob basis (e.g. to associate the rewrite function to the set specified by --key). We've decided against introducing the top-level value-pairs element in the configuration, but rather made up a possible command-line-like syntax. Something along the lines of: $(format-json --key .cee.* --rewrite replace .cee=Event) -- Bazsi
Balazs Scheidler <bazsi@balabit.hu> writes:
On Tue, 2011-10-04 at 10:56 +0200, Gergely Nagy wrote:
So I came up with a different syntax: wiring rekey into the key() option of value-pairs! That way, we already selected a subset to work on, and the transformations would apply to only those.
(This could be combined with the global syntax aswell, though)
So it'd look something like this:
value-pairs( scope("everything") key(".secevt.*" rekey(add-prefix("events"))) key(".classifier.*" rekey(add-prefix("syslog-ng"))) key(".sdata.*" rekey(shift(1))) key(".*" rekey(replace ("." "_"))) );
Doing this proved to be more difficult than originally anticipated, so I ended up with something inbetween (which, eventually, will be turned into the syntax above): value-pairs( scope(everything) rekey(".cee.*" shift(4) add-prefix("Events") replace("Events.move_me_to_the_top" "moved_to_the_top") ) rekey(".classifier.*" add-prefix("syslog-ng")) reley(".sdata.*" shift(1)) rekey(".*" replace("." "_")) );
We've discussed this IRL and came to the conclusion that it is very handy to allow key-rewrite to be applied on a per-glob basis (e.g. to associate the rewrite function to the set specified by --key).
This is now (partially) done on my feature/3.4/value-pairs/rekey branch. Since it's still a work in progress, I'm not including the merged patches yet, but give a pointer to a diff between 3.4 master and my branch instead: https://github.com/algernon/syslog-ng/compare/algernon:upstream/mirror/3.4.....
Something along the lines of:
$(format-json --key .cee.* --rewrite replace .cee=Event)
This is not done yet, either. I'll make key() take a glob first, then proceed with the commandline support. -- |8]
Gergely Nagy <algernon@balabit.hu> writes:
We've discussed this IRL and came to the conclusion that it is very handy to allow key-rewrite to be applied on a per-glob basis (e.g. to associate the rewrite function to the set specified by --key).
This is now (partially) done on my feature/3.4/value-pairs/rekey branch. Since it's still a work in progress, I'm not including the merged patches yet, but give a pointer to a diff between 3.4 master and my branch instead:
https://github.com/algernon/syslog-ng/compare/algernon:upstream/mirror/3.4.....
The branch is now updated a bit, and I ran into a silly issue, that currently prevents me from wiring rekey() into key(): key() is used to add extra elements into the set, so turning it into a match-only thing is not really an option. Ie, if I want to rewrite every key, and prefix them with "foo.", I currently do this: value-pairs(scope(dot-nv-pairs) rekey("*" add-prefix("foo.")) ); However, if I keep key()'s current behaviour of adding stuff to the set, and wire rekey into it: value-pairs(scope(dot-nv-pairs) key("*" rekey(add-prefix("foo."))) ); This will do something completely different: it will also include EVERY key, despite our scope. And we can't rewrite everything that's *in* the scope already, but nothing else. So I either change the behaviour of key(), which I wouldn't want to, or I keep rekey() separate. I believe keeping rekey() separate is the better, and more flexible option.
Something along the lines of:
$(format-json --key .cee.* --rewrite replace .cee=Event)
This is not done yet, either. I'll make key() take a glob first, then proceed with the commandline support.
Similarly, this will turn into: $(format-json --key .cee.* --rekey .cee.* replace .cee=Event) (--key includes it, --rekey rewrites the keys) Perhaps a bit more verbose, but mostly backwards compatible. Anyway, the current tip of my feature/3.4/value-pairs/rekey branch also modifies the behaviour of key(): it now accepts a glob, and will include every key in the set that matches the glob. Except if it is excluded by a later exclude(). The key() and exclude() options are now evaluated in order, and the last one wins. So value-pairs(key(".cee.*") exclude(".*")) will end up with an empty set. -- |8]
Gergely Nagy <algernon@balabit.hu> writes:
Something along the lines of:
$(format-json --key .cee.* --rewrite replace .cee=Event)
This is not done yet, either. I'll make key() take a glob first, then proceed with the commandline support.
Similarly, this will turn into:
$(format-json --key .cee.* --rekey .cee.* replace .cee=Event)
(--key includes it, --rekey rewrites the keys)
Ladies and gentlemen, I present you the final piece of the value-pairs key rewrite patchset: https://github.com/algernon/syslog-ng/compare/algernon:upstream/mirror/3.4..... Now, it is possible to create a config like the following: source s_cee { tcp(port(12345) flags(no-parse)); }; parser p_cee { json-parser(prefix(".cee.")); }; template t_cee { template("$(format-json --key .cee.* --rekey .cee.* --shift 4)\n"); }; destination d_json { file("/var/log/cee.json" template(t_cee)); }; log { source(s_cee); parser(p_cee); destination(d_cee); }; And we should get back the same JSON that entered. More or less, anyway. The JSON parser can't handle nested objects yet (nor can format-json). But that will be the next step. The current state of the key rewriting branch is, in my opinion, pretty good. It could, perhaps, use a little bit of cleanup here and there, and then it's merge ready as far as I can see. However, the scratch-buffer patch needs to go in first, and I'll rebase the rekey work on top of that then. At the moment, the rekey branch is still using non-thread-safe GStrings. -- |8]
Gergely Nagy <algernon@balabit.hu> writes:
Ladies and gentlemen, I present you the final piece of the value-pairs key rewrite patchset:
Well, that wasn't that final, afterall! After talking it through with Bazsi, a few changes were still made: * rekey() exists on its own no longer, it's a sub-option of key() now, and uses the same glob. Likewise, the command-line parser received similar treatment. The result is still available on my feature/3.4/value-pairs/rekey branch, and the complete diff between 3.4 and the branch is at the following location: https://github.com/algernon/syslog-ng/compare/algernon:upstream/mirror/3.4..... So, we can now write stuff like this: value-pairs(key(".cee.*" rekey(shift(4) add-prefix("Event") replace("Event.foo" "foo")))) Or, the same thing in template function syntaxt: $(format-json --key .cee.* --rekey --shift 4 --add-prefix Event --replace Event.foo=foo) I think this makes the syntax easier, and more straightforward. Any comments, criticism or ideas are welcome! I plan to fold my branch into at most a few distinct commits, and then request a merge sometime during next week. -- |8]
participants (2)
-
Balazs Scheidler
-
Gergely Nagy