[syslog-ng] RFC: value-pairs() improvements, and the way forward

Gergely Nagy algernon at balabit.hu
Fri Feb 10 15:20:06 CET 2012


Hi!

As is my custom when I get stuck thinking about something probably
trivial for those more versed in algorithmical thinking, I turn to the
list, and dump my ideas, in the hopes to borrow all the good ideas you
lot usually have. And unlike most times, I'll try to be short and to the
point for a change.

Some of you might have heard about this value-pairs thing that found its
way into syslog-ng 3.3, and received some neat new features in 3.4, but
as a reminder, let me tell you a few words about it: it's a reasonably
convenient, simple, yet flexible way to build a set of key-value pairs
from all the data syslog-ng managed to collect about and parse out of a
message. In 3.4, one can even apply various transformations to the keys,
to massage them into a format suitable for applications outside of
syslog-ng.

As of this writing, the mongodb destination driver and the format-json
template function are the two users of this feature, and a few more
users will likely see the light of day in the not too distant future.

However, there are multiple issues with the current implementation:

- It's far slower than it needs to be.
- It's hard to sort the keys alphabetically (which would be useful for
  format-welf, for example, or anywhere else where we want the order of
  keys to remain constant between messages)
- It would be very, very inconvenient to enhance the implementation to
  be able to handle nested structures. And we do need nested structures:
  the json parser (in 3.4) already transforms nested structures into an
  internal format, that could - in theory - be expanded back to the
  original, or something very close, but value-pairs does not support
  that.

So, I want to change the *implementation* of value-pairs, to fix all of
the problems above. The syntax in the config file *will* remain the same
(though, both mongodb and format-json will probably grow an extra
parameter, that controls whether we want a flat structure, or a nested
object structure).

As a first step, I will replace the GHashTables (which is a key-value
store, optimised for random access, basically) with GTree (which is a
sorted key-value store, optimised for ordered traversal, pretty much
what we need).

This has the advantage of being a bit more efficient than GHashTables,
and gaining us an ordered key list for free.

The hard part is writing the flat format => tree conversion, and making
it efficient. I have some crude code to do it, which iterates through a
sorted key list, splits the key up by dots, finds the appropriate node
in the tree, if any, and adds a new child node. This means we iterate
through the original key list only once (this is good), but we have to
perform a lot of lookups in another tree for each key (this is bad).

Now, I can optimise this a bit, and use the ordered property of the
original list, and store the path to the last inserted node, and try to
use that for speedier backtracking. But that's something I'll probably
do in a second version of the update..

I believe this method would be reasonably efficient, but if anyone has a
better suggestion, please let me know.

-- 
|8]



More information about the syslog-ng mailing list