Yep, I think you're on the right track in that some rewriting will definitely be necessary for Mongo. I'm a bit concerned with performance, but Mongo will probably be the bottleneck when things don't fit in RAM anyway. On Tue, May 10, 2011 at 2:07 PM, Gergely Nagy <algernon@balabit.hu> wrote:
Hi!
Now that value-pairs() is in 3.3, it's time to dig up an idea Bazsi and I were discussing way back when we first talked about value-pairs: a way to change the keys in a value-pairs set, without the need to explicitly specify them all using pair().
It's actually easier to explain this by explaining the need behind this feature: there's the MongoDB destination, and by default, SDATA goes under the "sdata" key, somewhat like this:
{ "sdata": { "test": "value" } }
Now, if I'd rather have those values under, say "sd", I can't do that with the current driver, because I can't tell value-pairs() that "sdata" should be mapped to "sd" instead. The best I can do, is exclude ".SDATA.*", and either use "$SDATA", and post-process it, or list all the .SDATA.* keys explicitly. Neither of which is good enough.
So, I propose that we should have a way to remedy this problem, and this remedy should be called "rekey()".
The way I imagine it, is something like this:
value-pairs ( scope("selected-macros" "nv-pairs") rekey( regexp("^\.SDATA\.(.*)" "sd.$1") prefix(".secevt.*" "events") prefix("[A-Z]*" "syslog.") ) )
This would do the following:
- Any key that begins with ".SDATA." will have that part replaced with "sd." - Keys matching ".secevt.*" (shell glob, not regexp) will be prefixed with "events". Thus ".secevt.verdict" would become "events.secevt.verdict". - Keys that are all uppercase would be prefixed with "syslog.", thus "HOST" would become "syslog.HOST"
The transformations would be applied to the raw set of keys, in the order they're listed in the configuration file. Initially, regexp() and prefix() would be implemented only, with the possibility of adding more, if the need arises.
This would also solve another problem I encountered recently: if the value-pairs() result set contains both "$SDATA" and "$SDATA.*" (which is the case if one specifies scope("selected-macros" "nv-pairs") and the incoming message has structured data), then we'll have a key conflict in the MongoDB destination, because internally "foo.bar" gets translated to (using JSON notation):
{ "foo": { "bar": ... } }
Now, in the case of SDATA, this translates to something like the following:
{ SDATA: "[foo=bar]", // $SDATA SDATA: { "foo": "bar" // $SDATA.foo } }
This is because the MongoDB destination strips the leading dot at the moment (because that would be invalid too), and we end up with conflicting types: one string, and one object. The driver does not support overriding right now, so this is a problem.
I could, of course, change the driver to replace the dot with an underscore, but that would be costier than the current stripping, and would still be ugly, in my opinion.
It's much nicer to allow the users to rewrite the keys instead, or prefix them.
That's about how far I got with thinking for now. Critique, comments and ideas would be most appreciated.
(PS: This is, of course, strictly 3.4 material, as 3.3 is in a feature freeze)
-- |8]
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html