[RFC] value-pairs and key rewriting

10 May 2011

      Hi!

Now that value-pairs() is in 3.3, it's time to dig up an idea Bazsi and
I were discussing way back when we first talked about value-pairs: a way
to change the keys in a value-pairs set, without the need to explicitly
specify them all using pair().

It's actually easier to explain this by explaining the need behind this
feature: there's the MongoDB destination, and by default, SDATA goes
under the "sdata" key, somewhat like this:

{
 "sdata": {
   "test": "value"
 }
}

Now, if I'd rather have those values under, say "sd", I can't do that
with the current driver, because I can't tell value-pairs() that "sdata"
should be mapped to "sd" instead. The best I can do, is exclude
".SDATA.*", and either use "$SDATA", and post-process it, or list all
the .SDATA.* keys explicitly. Neither of which is good enough.

So, I propose that we should have a way to remedy this problem, and this
remedy should be called "rekey()".

The way I imagine it, is something like this:

value-pairs (
  scope("selected-macros" "nv-pairs")
  rekey(
    regexp("^\.SDATA\.(.*)" "sd.$1")
    prefix(".secevt.*" "events")
    prefix("[A-Z]*" "syslog.")
  )
)

This would do the following:

- Any key that begins with ".SDATA." will have that part replaced with
  "sd."
- Keys matching ".secevt.*" (shell glob, not regexp) will be prefixed
  with "events". Thus ".secevt.verdict" would become
  "events.secevt.verdict".
- Keys that are all uppercase would be prefixed with "syslog.", thus
  "HOST" would become "syslog.HOST"

The transformations would be applied to the raw set of keys, in the
order they're listed in the configuration file. Initially, regexp() and
prefix() would be implemented only, with the possibility of adding more,
if the need arises.

This would also solve another problem I encountered recently: if the
value-pairs() result set contains both "$SDATA" and "$SDATA.*" (which is
the case if one specifies scope("selected-macros" "nv-pairs") and the
incoming message has structured data), then we'll have a key conflict in
the MongoDB destination, because internally "foo.bar" gets translated to
(using JSON notation):

{ "foo": { "bar": ... } }

Now, in the case of SDATA, this translates to something like the
following:

{
 SDATA: "[foo=bar]", // $SDATA
 SDATA: {
   "foo": "bar" // $SDATA.foo
 }
}

This is because the MongoDB destination strips the leading dot at the
moment (because that would be invalid too), and we end up with
conflicting types: one string, and one object. The driver does not
support overriding right now, so this is a problem.

I could, of course, change the driver to replace the dot with an
underscore, but that would be costier than the current stripping, and
would still be ugly, in my opinion.

It's much nicer to allow the users to rewrite the keys instead, or
prefix them.

That's about how far I got with thinking for now. Critique, comments and
ideas would be most appreciated.

(PS: This is, of course, strictly 3.4 material, as 3.3 is in a feature
freeze)

-- 
|8]

Gergely Nagy

Martin Holste

Gergely Nagy

Gergely Nagy

Gergely Nagy

Balazs Scheidler

Gergely Nagy

tags

participants (3)