[syslog-ng] RFC: value-pairs() improvements, and the way forward
Balazs Scheidler
bazsi at balabit.hu
Sat Feb 25 17:01:02 CET 2012
On Fri, 2012-02-10 at 15:20 +0100, Gergely Nagy wrote:
> Hi!
>
> As is my custom when I get stuck thinking about something probably
> trivial for those more versed in algorithmical thinking, I turn to the
> list, and dump my ideas, in the hopes to borrow all the good ideas you
> lot usually have. And unlike most times, I'll try to be short and to the
> point for a change.
>
> Some of you might have heard about this value-pairs thing that found its
> way into syslog-ng 3.3, and received some neat new features in 3.4, but
> as a reminder, let me tell you a few words about it: it's a reasonably
> convenient, simple, yet flexible way to build a set of key-value pairs
> from all the data syslog-ng managed to collect about and parse out of a
> message. In 3.4, one can even apply various transformations to the keys,
> to massage them into a format suitable for applications outside of
> syslog-ng.
>
> As of this writing, the mongodb destination driver and the format-json
> template function are the two users of this feature, and a few more
> users will likely see the light of day in the not too distant future.
>
> However, there are multiple issues with the current implementation:
>
> - It's far slower than it needs to be.
> - It's hard to sort the keys alphabetically (which would be useful for
> format-welf, for example, or anywhere else where we want the order of
> keys to remain constant between messages)
> - It would be very, very inconvenient to enhance the implementation to
> be able to handle nested structures. And we do need nested structures:
> the json parser (in 3.4) already transforms nested structures into an
> internal format, that could - in theory - be expanded back to the
> original, or something very close, but value-pairs does not support
> that.
>
> So, I want to change the *implementation* of value-pairs, to fix all of
> the problems above. The syntax in the config file *will* remain the same
> (though, both mongodb and format-json will probably grow an extra
> parameter, that controls whether we want a flat structure, or a nested
> object structure).
>
> As a first step, I will replace the GHashTables (which is a key-value
> store, optimised for random access, basically) with GTree (which is a
> sorted key-value store, optimised for ordered traversal, pretty much
> what we need).
>
> This has the advantage of being a bit more efficient than GHashTables,
> and gaining us an ordered key list for free.
>
> The hard part is writing the flat format => tree conversion, and making
> it efficient. I have some crude code to do it, which iterates through a
> sorted key list, splits the key up by dots, finds the appropriate node
> in the tree, if any, and adds a new child node. This means we iterate
> through the original key list only once (this is good), but we have to
> perform a lot of lookups in another tree for each key (this is bad).
>
> Now, I can optimise this a bit, and use the ordered property of the
> original list, and store the path to the last inserted node, and try to
> use that for speedier backtracking. But that's something I'll probably
> do in a second version of the update..
>
> I believe this method would be reasonably efficient, but if anyone has a
> better suggestion, please let me know.
>
I think this sounds perfectly reasonable.
--
Bazsi
More information about the syslog-ng
mailing list