No subject


Mon Feb 21 10:55:38 CET 2011


* It needs to be able to understand value-pairs().

This is fairly easy: the core grammar rules provide 'value_pair_stmt',
which can be used to parse these things.

To use it, one should do something along these lines (in the driver's
grammar file):

,----
| driver_option
|   : ...
|   | value_pair_stmt { driver_set_value_pairs(driver_instance, last_value_pairs); }
`----

Where the _set_value_pairs() function can be as simple as this:

,----
| void
| driver_set_value_pairs (LogDriver *d, ValuePairs *vp)
| {
|   MyDriver *self = (MyDriver *)d;
| 
|   value_pairs_free (self->vp);
|   self->vp = vp;
| }
`----

With this, value-pairs() are recognised.

* The driver needs to set up defaults.

This can be done with calling the various functions in
value-pairs.h. I'm not going to describe them, they're pretty darn self
explanatory, especially after one had a look at the core grammar file.

The idea is, that if no value-pairs() are defined, we'll use a sane
default. If the user specifies value-pairs(), then the
_set_value_pairs() function will free up the defaults, and replace them
with the user supplied configuration.

* The driver needs to make use of value-pairs().

To actually _use_ value pairs, the driver needs to iterate over these
pairs. The way to do that, is the value_pairs_foreach() function.

This takes a couple of parameters:

  + The ValuePairs object.
  + A driver-supplied foreach callback (more about this later)
  + An NVTable (the message payload in most cases)
  + An NVRegistry (logmsg_registry in most cases)
  + A LogMessage object
  + An extra user_data argument, which will be passed down to the
  callback.

The interesting things here are the callback and the user_data. The
user_data pointer can be used to pass custom data to the callback, so
that it can actually do something with the data it receives.

The callback itself will be called for each and every name-value pair
that matches the user-configured criteria, with the following arguments:

  + The key name
  + The key value
  + The user_data argument passed to value_pairs_foreach()

Using these, the callback can do whatever it wants with the data. Do
note, that both the key name and key value will likely be freed after
the foreach loop completed, so if their value will be used later by the
driver, it needs to copy them.

An example
==========

I have updated my mongodb destination driver to use value-pairs(), the
code is available from the work/afmongodb-vp branch of my git tree, or
browsable online at
http://git.balabit.hu/?p=algernon/syslog-ng-3.3.git;a=shortlog;h=work/afmongodb-vp

Using a configuration like the following:

,----
| @version: 3.3
| @include "scl.conf"
| 
| @module afmongodb
| 
| source s_network {
|         tcp(port (10514) tags("tcp-tag"));
|         syslog(port (10515) tags("syslog-tag"));
| };
| 
| destination d_mongo {
|         mongodb(
|                 value-pairs(
|                         scope(selected_macros nv_pairs)
|                         exclude("R_*")
|                         exclude("S_*")
|                         exclude("HOST_FROM")
|                         exclude("MSG")
|                         pair("test" "test: ${loggen.runid}")
|                 )
|                 database("test")
|                 collection("syslog")
|         );
| };
| 
| parser p_loggen {
|         db_parser(
|                 file("etc/loggen.pdb")
|         );
| };
| 
| log {
|         source(s_network);
|         parser(p_loggen);
|         destination(d_mongo);
| };
`----

(for etc/loggen.pdb, see the attachment of this message)

When poking syslog-ng with a couple of standard loggen messages, mongodb
would contain documents like the following:

,----
| {
|   "DATE" : "Mar 25 16:06:37",
|   "FACILITY" : "auth",
|   "HOST" : "localhost",
|   "MESSAGE" : "seq: 0000000018, thread: 0000, runid: 1301065597, stamp: 2011-03-25T16:06:37 PADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPAD",
|   "PID" : "1234",
|   "PRIORITY" : "info",
|   "PROGRAM" : "prg00000",
|   "SDATA" : "",
|   "SEQNUM" : "",
|   "SOURCE" : "s_network",
|   "SOURCEIP" : "127.0.0.1",
|   "TAGS" : ".classifier.system,syslog-tag,.source.s_network,loggen",
|   "_id" : ObjectId("4d8caf7d8ccffe2227000013"),
|   "classifier" : {
|     "class" : "debug",
|     "rule_id" : "d7d3ada4-6907-4dad-924f-d254e8f29f92"
|   },
|   "loggen" : {
|     "runid" : "1301065597",
|     "seq" : "0000000018",
|     "stamp" : "2011-03-25T16:06:37",
|     "thread" : "0000"
|   },
|   "test" : "test: 1301065597"
| }
`----

Even though SDATA and SEQNUM are empty (there's no SDATA in the message,
and SEQNUM is not supported by value-pairs() at the moment), the
document in mongodb is neatly structured.

Implementation details
======================

The way value-pairs() work right now, is very simple: during setup, we
just store a couple of things (the scopes, as an ORed together value;
the exclude patterns and the explicitly added keys).

The bulk of the work is done by value_pairs_foreach(), which will
construct a base set of name-value pairs, based on the scope, then
iterate over them, filter out anything that is excluded, and pass the
rest to the callback.

When done, it goes over the explicitly added keys, and runs those
through the callback too.

The implementation is horribly inefficient in many ways, but for a
preview, it's good enough. I'll tune it for efficiency in the near
future. However, the public API should not change anymore.

And that's about it!

TODO
====

* Add helpers that can be used by template functions that want to
  support value-pairs() (for example tfjson).
* Support SEQNUM
* Possibly filter out empty, zero-length values, since they're kinda
  useless.
* Performance tuning:
  + Pre-allocate LogTemplate structures, if possible
  + Try to reduce the number of memory allocations
  + Possibly move away from GHashTable (as used for the temporary
  base-set and for the explicit keys) to something lighter.
* Clean up the grammar
  Right now, the grammar is messy, there's a lot of global symbols we
  use, which really should be local to value-pairs(). I still need to
  get familiar with bison/flex again to understand how these things
  work.
  
Once these are sorted out, I'll flatten the patch set and submit a merge
request. In the meantime, if anyone feels up to it, I'd appreciate any
code reviews and comments.

-- 
|8]


--=-=-=
Content-Type: text/plain
Content-Disposition: attachment; filename=loggen.pdb
Content-Description: patterndb rules for loggen

<patterndb version='3' pub_date='2011-03-25'>
  <ruleset name='loggen' id='d7d3ada4-6907-4dad-924f-d254e8f29f92'>
    <rules>
      <rule id='d7d3ada4-6907-4dad-924f-d254e8f29f92' class='system' provider='algernon at balabit' class='debug'>
        <description>loggen output</description>
        <patterns>
          <pattern>seq: @ESTRING:loggen.seq:,@ thread: @ESTRING:loggen.thread:,@ runid: @ESTRING:loggen.runid:,@ stamp: @ESTRING:loggen.stamp: @</pattern>
        </patterns>
        <examples>
            <example>
                <test_message program='loggen'>seq: 0000000000, thread: 0000, runid: 1301060425, stamp: 2011-03-25T14:40:25 PADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADD</test_message>
            </example>
        </examples>
	<tags>
	  <tag>loggen</tag>
	</tags>
      </rule>
    </rules>
  </ruleset>
</patterndb>

--=-=-=--


More information about the syslog-ng mailing list