Hi! It's been a while I came to the list with value-pairs(), and the last time was a syntax RFC. This time, after almost two months of barely doing a thing with it, I'm able to present some code! The downside is, that the code is horribly inefficient at this time, but it's good enough to introduce the API (as can be used by destination drivers), as that part of it should be fairly complete by now. A lengthy description will follow - I hope I won't bore anyone out of their skull. If I do, apologies, I'll try to be more entertaining next time! To refresh our memories (mine included), the syntax we're trying to implement is something like the following: ,---- | destination d_mongo { | mongodb( | value-pairs( | scope(selected_macros nv_pairs) | exclude("R_*") | exclude("S_*") | exclude("HOST_FROM") | exclude("MSG") | pair("test" "test: ${loggen.runid}") | ) | database("test") | collection("syslog") | ); | }; `---- We'd like this to include every possible name-value pair, except the ones starting with R_ or S_, MSG and HOST_FROM. As a bonus, we'll add a custom key: "test" (with a custom template; more about that later). The purpose of this, is to be able to store a whole lot of (usually structured) data that are either dynamically generated (by patterndb) or cannot be easily predicted at configuration time (like SDATA), without the need to explicitly list all keys in a template. The code is available from the 'work/value-pairs/base' branch of my git tree at git://git.balabit.hu/algernon/syslog-ng-3.3.git (or browseable online at: http://git.balabit.hu/?p=algernon/syslog-ng-3.3.git;a=shortlog;h=work/value-...) Adding value-pairs() support to drivers =======================================
From the driver's point of view, it needs to do three things to support value-pairs():
* It needs to be able to understand value-pairs(). This is fairly easy: the core grammar rules provide 'value_pair_stmt', which can be used to parse these things. To use it, one should do something along these lines (in the driver's grammar file): ,---- | driver_option | : ... | | value_pair_stmt { driver_set_value_pairs(driver_instance, last_value_pairs); } `---- Where the _set_value_pairs() function can be as simple as this: ,---- | void | driver_set_value_pairs (LogDriver *d, ValuePairs *vp) | { | MyDriver *self = (MyDriver *)d; | | value_pairs_free (self->vp); | self->vp = vp; | } `---- With this, value-pairs() are recognised. * The driver needs to set up defaults. This can be done with calling the various functions in value-pairs.h. I'm not going to describe them, they're pretty darn self explanatory, especially after one had a look at the core grammar file. The idea is, that if no value-pairs() are defined, we'll use a sane default. If the user specifies value-pairs(), then the _set_value_pairs() function will free up the defaults, and replace them with the user supplied configuration. * The driver needs to make use of value-pairs(). To actually _use_ value pairs, the driver needs to iterate over these pairs. The way to do that, is the value_pairs_foreach() function. This takes a couple of parameters: + The ValuePairs object. + A driver-supplied foreach callback (more about this later) + An NVTable (the message payload in most cases) + An NVRegistry (logmsg_registry in most cases) + A LogMessage object + An extra user_data argument, which will be passed down to the callback. The interesting things here are the callback and the user_data. The user_data pointer can be used to pass custom data to the callback, so that it can actually do something with the data it receives. The callback itself will be called for each and every name-value pair that matches the user-configured criteria, with the following arguments: + The key name + The key value + The user_data argument passed to value_pairs_foreach() Using these, the callback can do whatever it wants with the data. Do note, that both the key name and key value will likely be freed after the foreach loop completed, so if their value will be used later by the driver, it needs to copy them. An example ========== I have updated my mongodb destination driver to use value-pairs(), the code is available from the work/afmongodb-vp branch of my git tree, or browsable online at http://git.balabit.hu/?p=algernon/syslog-ng-3.3.git;a=shortlog;h=work/afmong... Using a configuration like the following: ,---- | @version: 3.3 | @include "scl.conf" | | @module afmongodb | | source s_network { | tcp(port (10514) tags("tcp-tag")); | syslog(port (10515) tags("syslog-tag")); | }; | | destination d_mongo { | mongodb( | value-pairs( | scope(selected_macros nv_pairs) | exclude("R_*") | exclude("S_*") | exclude("HOST_FROM") | exclude("MSG") | pair("test" "test: ${loggen.runid}") | ) | database("test") | collection("syslog") | ); | }; | | parser p_loggen { | db_parser( | file("etc/loggen.pdb") | ); | }; | | log { | source(s_network); | parser(p_loggen); | destination(d_mongo); | }; `---- (for etc/loggen.pdb, see the attachment of this message) When poking syslog-ng with a couple of standard loggen messages, mongodb would contain documents like the following: ,---- | { | "DATE" : "Mar 25 16:06:37", | "FACILITY" : "auth", | "HOST" : "localhost", | "MESSAGE" : "seq: 0000000018, thread: 0000, runid: 1301065597, stamp: 2011-03-25T16:06:37 PADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPAD", | "PID" : "1234", | "PRIORITY" : "info", | "PROGRAM" : "prg00000", | "SDATA" : "", | "SEQNUM" : "", | "SOURCE" : "s_network", | "SOURCEIP" : "127.0.0.1", | "TAGS" : ".classifier.system,syslog-tag,.source.s_network,loggen", | "_id" : ObjectId("4d8caf7d8ccffe2227000013"), | "classifier" : { | "class" : "debug", | "rule_id" : "d7d3ada4-6907-4dad-924f-d254e8f29f92" | }, | "loggen" : { | "runid" : "1301065597", | "seq" : "0000000018", | "stamp" : "2011-03-25T16:06:37", | "thread" : "0000" | }, | "test" : "test: 1301065597" | } `---- Even though SDATA and SEQNUM are empty (there's no SDATA in the message, and SEQNUM is not supported by value-pairs() at the moment), the document in mongodb is neatly structured. Implementation details ====================== The way value-pairs() work right now, is very simple: during setup, we just store a couple of things (the scopes, as an ORed together value; the exclude patterns and the explicitly added keys). The bulk of the work is done by value_pairs_foreach(), which will construct a base set of name-value pairs, based on the scope, then iterate over them, filter out anything that is excluded, and pass the rest to the callback. When done, it goes over the explicitly added keys, and runs those through the callback too. The implementation is horribly inefficient in many ways, but for a preview, it's good enough. I'll tune it for efficiency in the near future. However, the public API should not change anymore. And that's about it! TODO ==== * Add helpers that can be used by template functions that want to support value-pairs() (for example tfjson). * Support SEQNUM * Possibly filter out empty, zero-length values, since they're kinda useless. * Performance tuning: + Pre-allocate LogTemplate structures, if possible + Try to reduce the number of memory allocations + Possibly move away from GHashTable (as used for the temporary base-set and for the explicit keys) to something lighter. * Clean up the grammar Right now, the grammar is messy, there's a lot of global symbols we use, which really should be local to value-pairs(). I still need to get familiar with bison/flex again to understand how these things work. Once these are sorted out, I'll flatten the patch set and submit a merge request. In the meantime, if anyone feels up to it, I'd appreciate any code reviews and comments. -- |8]