Hi! A while ago, I posted a proposal about key rewriting for value-pairs. Today I'm happy to announce that I have some half-baked code, and a couple of ideas on how to proceed. Below, I'll share a few technical details about the current implementation, its limits, and my idea of the way forward. To reiterate, this is roughly the syntax I described earlier:
The way I imagine it, is something like this:
value-pairs ( scope("selected-macros" "nv-pairs") rekey( regexp("^\.SDATA\.(.*)" "sd.$1") prefix(".secevt.*" "events") prefix("[A-Z]*" "syslog.") ) )
As of this writing, this is what's implemented: value-pairs( scope("everything") rekey( add_prefix(".classifier.*" ".syslog-ng") shift(".sdata.*" 1) add_prefix(".*" "private") ) ) After constructing the full scope of keys to work with, value-pairs() will iterate over them, and apply all the rekey transformations in the order listed. The two available transformation functions are: * add-prefix(glob, prefix): with which one can match keys based on a shell glob, and add a prefix to them. * shift(glob, amount): with which one can shift the matched keys a few bytes. This means that given a structure like the following: { ".classifier.rule_id": "foobar", ".sdata.foo": "bar", ".sdata.bar": "baz", ".my-stuff.this": "that" } .classifier.rule_id will first be transformed to .syslog-ng.classifier.rule-id (due to the first rule), it doesn't match the second, and the third transforms it again to private.syslog-ng.classifier.rule-id. The second doesn't match the first, the next will transform it to sdata.foo, and then it doesn't match the last rule anymore. Same goes for the third item in our list. The last item only matches the third rule, so it will be transformed into private.my-stuff.this. That's about it! In the near future, I want to implement two more transform functions: * replace(prefix, new_prefix): which takes two strings, and if a key starts with prefix, it will be replaced by new_prefix (they can be of different length). * regexp(pattern, replace): Which does pretty much the same as replace(), but instead of matching on a prefix, does a whole PCRE match and replace. Performance =========== Performance is not the greatest, but I haven't measured yet, so everything below should be taken with a grain of salt. Key rewriting has inevitable costs, ones that we can't easily get around: we need to match each key we work with against a pattern (or at least a prefix in the best case), and then apply a transformation, which most often will result in extra memory allocations. I tried to limit allocations to a minimum, and cache & lookup instead, whenever possible. But that too, has a cost, even if slightly less than always allocating memory for the same transformations. There's probably a few ways in which performance could be (and will be) improved, but at the moment, the focus is bringing the full set of features in, and cleaning up the mess I made afterwards. Only then, when the mess is gone, will I start to think about making it the fastest possible. Implementation ============== At the current stage of this work, the implementation is a bit messy and inefficient, but it's not all that horrible, in my opinion. It works by calling vp_transform_apply(vp, key) on every key that is inserted into the final scope. If no transformations were specified, this function returns immediately, and we go on as if nothing happened. If we do have transformations, then each matching one gets applied in turn, until we reach the end (I might introduce an optional "final" flag for the transformation functions, so that 'final' flagged transformations will short-circuit the loop if they match a key). The transformation functions try their best not to duplicate strings or allocate memory: * shift() simply returns the same pointer it received, just shifted N bytes (if N is < 0, the whole string is returned, but otherwise no attempt is made to verify that the string is long enough to shift N bytes - yet). * add-prefix() will try to look up a match from an internal hash, and add the new transformation there, if one wasn't found. This allows me to make shift() far lighter on resources than add-prefix(): it doesn't need any memory allocation at all! When syslog-ng is shutting down, value_pairs_free() will call the ->destroy callbacks of the various transformers, which are responsible for freeing up the transformer-specific structures (eg, add-prefix's hash table). There's quite a bit to improve still, though: in order to support transformation functions that expect something else than a shell glob as their first argument, the matching must be abstracted away aswell, among other things. I'm also thinking about rewriting the current - quite hackish - ValuePairTransformer structure into something that resembles object oriented design instead: we'd have a basic ValuePairsTransformer, from which the various transformation functions would inherit from. We'd end up with pretty much the same thing, just in a cleaner design. For the adventurous types, the code is available from my git repo at git://git.balabit.hu/algernon/syslog-ng-3.3.git on the vp/rekey branch. While this is 3.4 material, it's on my 3.3 branch for now, because Bazsi's 3.4 tree doesn't have some of the latest 3.3 stuff, which I indirectly or directly depend on (eg, all the newish value-pairs fixes and enhancements), and I didn't feel like cherry-picking the good stuff. -- |8]