[syslog-ng] [PATCH 0/1]: json-parser()

Fri Oct 28 16:04:07 CEST 2011

Following this mail, a bigger patch will be sent, that implements a
JSON parser - a fairly simple one, for now. I did not want to explain
every little detail in the commit message, so this umbrella mail is
sent first, with a few details that cannot be found in the commit
message (there will be some overlap, though).

This patch (and any further developments I make in the future, until
the parser is merged) will be available on the feature/3.3/json/parser
branch of my new and shiny git tree[1], as soon as I push it out.

 [1]: git://github.com/algernon/syslog-ng.git

First of all, here's a little code snippet that shows how to use the
module:

,----[ syslog-ng.conf ]
| @module tfjson
| @module jsonparser
| 
| source s_json { tcp(port(12345) flags(no-parse)); };
| destination d_json {
|   file("/var/log/messages.json"
|        template("$(format-json --scope dot-nv-pairs)\n"));
| };
| parser p_json { json-parser(prefix(".json.")); };
| log { source(s_json); parser(p_json); destination (d_json); };
`----

With this config, we can send logs to this source as follows:

,----[ shell ]
| $ echo '{"string": "this is a string!\\n\\nmultiline!", "int": 42, "double": 3.14 }' | nc localhost 12345
`----

And 'lo and behold, the destination looks like this:

,----[ /var/log/messages.json ]
| {".json.string":"this is a string!\n\nmultiline!",".json.int":"42",".json.double":"3.140000"}
`----

Since syslog-ng stores every value as a string, they will all be
converted to strings. At some point in the future, when we can also
store types, this information will be transmitted through, and
format-json will preserve the types.

For now, the parser has only one optional parameter: prefix(). I think
you can figure out what it does.

I think this is pretty darn cool as it is!

Now, lets get down to the gory implementation details, as I'm sure
you're all very interested in that!

At the core, we have a LogJSONParser structure, which holds the
configuration (hi, gchar *prefix!), and two GStrings, that are used
over and over again during processing. The reason these two GStrings
(serialized.key and serialized.value) are there, is to have as few
memory allocations as possible: so we allocate the strings once, and
reuse them over and over again.

We also have a JSON tokener here, for similar reasons.

Parsing itself is delegated to json-c, and all the parser does is
iterate over the parsed object's keys, coerce them into strings,
prepare a key (based on prefix and the parsed key), and store them in
a LogMessage.

Then we're done. Yes, really. The rest is boilerplate.

Of course, life ain't that simple. For now, there's a bug where the
JSON can't be parsed, we honour that with a tasty segmentation
fault. This will be fixed at a later time.