RFC: value-pairs key rewrite framework, part N+1

4 Oct 2011

      Hi!

After my former mails (see the thread starting at
http://thread.gmane.org/gmane.comp.syslog-ng/11355/focus=11421), I'd
like to do a recap, and ask for comments, as I came up with a few new
ideas.

The purpose of the value-pairs key rewrite framework is to make it
possible to apply various transformations to the keys we selected with
VP. So that we can add or remove prefixes, replace parts of the string,
and so on.

As of this writing, code to do this exists on one of my branches, but it
hasn't been touched for a while, and I planned to update it in the near
future. And that's when it dawned on me, that perhaps the syntax isn't
all that great.

To show why I think that, let's see an example first:

value-pairs(
 scope("everything")
 rekey(
   add-prefix(".secevt" "events")
   add-prefix(".classifier" "syslog-ng")
   shift(".sdata.*" 1)
   replace("." "_")
 )
);

This will add an "events" prefix to each key that starts with ".secevt",
so that ".secevt.verdict" becomes "events.secevt.verdict"; similary,
".classifier.class" becomes "syslog-ng.classifier.class"; keys that
match '.sdata.*' get shifted to the right, removing the dot. And all
remaining dots at the begininng of a key will get replaced by an
underscore instead.

This kinda makes sense, and I could even massage the syntax into
format-json: $(format-json --scope everything --rekey --add-prefix
.secevt=events --addprefix .classifier=syslog-ng --shift .sdata.*=1
--replace .=_ --end-rekey)

However, this syntax has the downside of transformations being global: I
can't choose subset of my keys, and apply a list of transformations on
those and only those. Once I made a transformation, any transformations
in the list afterwards will see the transformed key. So I can't easily
say: "take all the keys starting with '.sdata.', shift them 6 chars,
then replace any key names that start with 'win' with 'lose', and
finally prefix them with 'whatever'". With the current syntax, that's
next to impossible to do sanely.

It's also not all that intuitive..

So I came up with a different syntax: wiring rekey into the key() option
of value-pairs! That way, we already selected a subset to work on, and
the transformations would apply to only those.

(This could be combined with the global syntax aswell, though)

So it'd look something like this:

value-pairs(
 scope("everything")
 key(".secevt.*" rekey(add-prefix("events")))
 key(".classifier.*" rekey(add-prefix("syslog-ng")))
 key(".sdata.*" rekey(shift(1)))
 key(".*" rekey(replace ("." "_")))
);

This would achieve the exact same effect as the example above, but with
a clearer syntax, perhaps. It would also mean that key rewriting can be
described at the same place where the key is selected to begin with.

The downside of this is that it'd be a bit harder to come up with a
syntax for format-json that mimics the config file syntax.

So, instead of trying to do that and end up with something horrible, I
have another proposal: lets make value-pairs a top-level citizen, so
that it joins the ranks of filter{} and rewrite{} and the like!

That way, we could turn the following ugly thing:

destination d_structured {
 mongodb(
  value-pairs(
   scope("everything")
   key(".secevt.*" rekey(add-prefix("events")))
   key(".classifier.*" rekey(add-prefix("syslog-ng")))
   key(".sdata.*" rekey(shift(1)))
   key(".*" rekey(replace ("." "_")))
  )
 );
 file("/var/log/structured.json" template("$(format-json <repeat the above stuff, but with format-json syntax>)\n"));
};

Into this beauty:

valuepairs vp_example {
 scope(everything);
 key(".secevt.*" rekey(add-prefix("events")));
 key(".classifier.*" rekey(add-prefix("syslog-ng")));
 key(".sdata.*" rekey(shift(1)));
 key(".*" rekey(replace ("." "_")));
};

destination d_structured {
 mongodb(value-pairs(vp_example));
 file("/var/log/structured.json" template("$(format-json --with-config vp_example)\n"));
};

This would have the nice consequence of not having to keep two parsers
in sync: format-json would only have a --with-config option, and nothing
else.

So, in the end, this whole boils down to two questions:

* What do you think about allowing (or even moving) rekey() inside key()
  options?
* What do you think about introducing a top-level valuepairs element,
  and dropping the format-json argument parsing stuff?

-- 
|8]

Gergely Nagy

Balazs Scheidler

Gergely Nagy

Gergely Nagy

Gergely Nagy

Gergely Nagy

tags

participants (2)