[PATCH (3.4) 0/4]: json-parser updates
Following this mail, a couple of patches will come - they were written in last November, but I don't think I posted them (as I didn't expect to see the json-parser merged so soon). Until now, they were sitting on my feature/3.4/json/parser branch, and that's where they sit now, too. I rebased that branch onto 3.4 master + center.c syntax error fixes (which I posted a week or two ago), and am sending the json parser patches now. They accomplish a few things: it makes the parser handle invalid JSON input properly (by returning an error, instead of crashing); adds support for parsing boolean, array and nested object types. Booleans will get parsed into TRUE or FALSE, nested objects will be turned into dotted notation (which also means that keys with a dot in them won't be handled properly by format-json, once it is updated to support reconstructing nested objects from dotted notation), and arrays will be treated like nested objects, with the array indexes used as a key. As an example, lets look at a JSON input: {"is-example": true, "hats": ["top hat", "baseball cap", "pointy witch-hat"], "auth": { "method": ["looking angry at the security guy", "bribery"], "result": "did not work" } } This would be parsed into the following name-value paris: "is-example"="TRUE" "hats.0"="top hat" "hats.1"="baseball cap" "hats.2"="pointy witch-hat" "auth.method.0"="looking angry at the security guy" "auth.method.1"="bribery" "auth.result"="did not work" The other part of this will be an enhahcement to value-pairs and format-json, that will make syslog-ng able to reconstruct the original JSON (more or less... booleans probably won't be supported, except perhaps optionally with a flag). But that is something I haven't started working on yet.
When JSON parsing fails, handle it gracefully, by printing an error message and returning immediately, instead of ending up segfaulting. Signed-off-by: Gergely Nagy <algernon@balabit.hu> --- modules/jsonparser/jsonparser.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/modules/jsonparser/jsonparser.c b/modules/jsonparser/jsonparser.c index 9347a5c..74e83a8 100644 --- a/modules/jsonparser/jsonparser.c +++ b/modules/jsonparser/jsonparser.c @@ -57,6 +57,12 @@ log_json_parser_process (LogParser *s, LogMessage *msg, const gchar *input) key = scratch_buffer_acquire (); value = scratch_buffer_acquire (); + if (!jso) + { + msg_error ("Unparsable JSON stream encountered", NULL); + return FALSE; + } + json_object_object_foreachC (jso, itr) { gboolean parsed = FALSE; -- 1.7.7.3
Moved the json_object iteration into its own function, so that it can be called recursively. This made us able to do just that, and easily handle nested objects. Signed-off-by: Gergely Nagy <algernon@balabit.hu> --- modules/jsonparser/jsonparser.c | 112 ++++++++++++++++++++++----------------- 1 files changed, 64 insertions(+), 48 deletions(-) diff --git a/modules/jsonparser/jsonparser.c b/modules/jsonparser/jsonparser.c index 74e83a8..24249f9 100644 --- a/modules/jsonparser/jsonparser.c +++ b/modules/jsonparser/jsonparser.c @@ -44,80 +44,96 @@ log_json_parser_set_prefix (LogParser *p, const gchar *prefix) self->prefix = g_strdup (prefix); } -static gboolean -log_json_parser_process (LogParser *s, LogMessage *msg, const gchar *input) +static void +log_json_parser_process_object (struct json_object *jso, + const gchar *prefix, + LogMessage *msg) { - LogJSONParser *self = (LogJSONParser *) s; - struct json_object *jso; struct json_object_iter itr; ScratchBuffer *key, *value; - jso = json_tokener_parse (input); - key = scratch_buffer_acquire (); value = scratch_buffer_acquire (); - if (!jso) - { - msg_error ("Unparsable JSON stream encountered", NULL); - return FALSE; - } - json_object_object_foreachC (jso, itr) { gboolean parsed = FALSE; switch (json_object_get_type (itr.val)) - { - case json_type_boolean: - msg_info ("JSON parser does not support boolean types yet, skipping", - evt_tag_str ("key", itr.key), NULL); - break; - case json_type_double: - parsed = TRUE; + { + case json_type_boolean: + msg_info ("JSON parser does not support boolean types yet, skipping", + evt_tag_str ("key", itr.key), NULL); + break; + case json_type_double: + parsed = TRUE; g_string_printf (sb_string (value), "%f", json_object_get_double (itr.val)); - break; - case json_type_int: - parsed = TRUE; + break; + case json_type_int: + parsed = TRUE; g_string_printf (sb_string (value), "%i", json_object_get_int (itr.val)); - break; - case json_type_string: - parsed = TRUE; + break; + case json_type_string: + parsed = TRUE; g_string_assign (sb_string (value), json_object_get_string (itr.val)); - break; - case json_type_object: - case json_type_array: - msg_error ("JSON parser does not support objects and arrays yet, " - "skipping", - evt_tag_str ("key", itr.key), NULL); - break; - default: - msg_error ("JSON parser encountered an unknown type, skipping", - evt_tag_str ("key", itr.key), NULL); - break; - } + break; + case json_type_object: + g_string_assign (sb_string (key), prefix); + g_string_append (sb_string (key), itr.key); + g_string_append_c (sb_string (key), '.'); + log_json_parser_process_object (itr.val, sb_string (key)->str, msg); + break; + case json_type_array: + msg_error ("JSON parser does not support arrays yet, " + "skipping", + evt_tag_str ("key", itr.key), NULL); + break; + default: + msg_error ("JSON parser encountered an unknown type, skipping", + evt_tag_str ("key", itr.key), NULL); + break; + } if (parsed) - { - if (self->prefix) - { - g_string_assign (sb_string (key), self->prefix); + { + if (prefix) + { + g_string_assign (sb_string (key), prefix); g_string_append (sb_string (key), itr.key); - log_msg_set_value (msg, + log_msg_set_value (msg, log_msg_get_value_handle (sb_string (key)->str), sb_string (value)->str, sb_string (value)->len); - } - else - log_msg_set_value (msg, - log_msg_get_value_handle (itr.key), - sb_string (value)->str, sb_string (value)->len); - } + } + else + log_msg_set_value (msg, + log_msg_get_value_handle (itr.key), + sb_string (value)->str, sb_string (value)->len); + } } scratch_buffer_release (key); scratch_buffer_release (value); +} + +static gboolean +log_json_parser_process (LogParser *s, LogMessage *msg, const gchar *input) +{ + LogJSONParser *self = (LogJSONParser *) s; + struct json_object *jso; + struct json_object_iter itr; + + jso = json_tokener_parse (input); + + if (!jso) + { + msg_error ("Unparsable JSON stream encountered", NULL); + return FALSE; + } + + log_json_parser_process_object (jso, self->prefix, msg); + json_object_put (jso); return TRUE; -- 1.7.7.3
Signed-off-by: Gergely Nagy <algernon@balabit.hu> --- modules/jsonparser/jsonparser.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/modules/jsonparser/jsonparser.c b/modules/jsonparser/jsonparser.c index 24249f9..cbe07fa 100644 --- a/modules/jsonparser/jsonparser.c +++ b/modules/jsonparser/jsonparser.c @@ -62,8 +62,11 @@ log_json_parser_process_object (struct json_object *jso, switch (json_object_get_type (itr.val)) { case json_type_boolean: - msg_info ("JSON parser does not support boolean types yet, skipping", - evt_tag_str ("key", itr.key), NULL); + parsed = TRUE; + if (json_object_get_boolean (itr.val)) + g_string_assign (sb_string (value), "true"); + else + g_string_assign (sb_string (value), "false"); break; case json_type_double: parsed = TRUE; -- 1.7.7.3
Arrays are parsed into a $fieldname.$index structure for now. Signed-off-by: Gergely Nagy <algernon@balabit.hu> --- modules/jsonparser/jsonparser.c | 142 +++++++++++++++++++++++---------------- 1 files changed, 85 insertions(+), 57 deletions(-) diff --git a/modules/jsonparser/jsonparser.c b/modules/jsonparser/jsonparser.c index cbe07fa..fe45479 100644 --- a/modules/jsonparser/jsonparser.c +++ b/modules/jsonparser/jsonparser.c @@ -47,85 +47,113 @@ log_json_parser_set_prefix (LogParser *p, const gchar *prefix) static void log_json_parser_process_object (struct json_object *jso, const gchar *prefix, + LogMessage *msg); + +static void +log_json_parser_process_single (struct json_object *jso, + const gchar *prefix, + const gchar *obj_key, LogMessage *msg) { - struct json_object_iter itr; ScratchBuffer *key, *value; + gboolean parsed = FALSE; key = scratch_buffer_acquire (); value = scratch_buffer_acquire (); - json_object_object_foreachC (jso, itr) + switch (json_object_get_type (jso)) { - gboolean parsed = FALSE; + case json_type_boolean: + parsed = TRUE; + if (json_object_get_boolean (jso)) + g_string_assign (sb_string (value), "true"); + else + g_string_assign (sb_string (value), "false"); + break; + case json_type_double: + parsed = TRUE; + g_string_printf (sb_string (value), "%f", + json_object_get_double (jso)); + break; + case json_type_int: + parsed = TRUE; + g_string_printf (sb_string (value), "%i", + json_object_get_int (jso)); + break; + case json_type_string: + parsed = TRUE; + g_string_assign (sb_string (value), + json_object_get_string (jso)); + break; + case json_type_object: + g_string_assign (sb_string (key), prefix); + g_string_append (sb_string (key), obj_key); + g_string_append_c (sb_string (key), '.'); + log_json_parser_process_object (jso, sb_string (key)->str, msg); + break; + case json_type_array: + { + gint i, plen; + + g_string_assign (sb_string (key), obj_key); + g_string_append_c (sb_string (key), '.'); + + plen = sb_string (key)->len; + + for (i = 0; i < json_object_array_length (jso); i++) + { + g_string_truncate (sb_string (key), plen); + g_string_append_printf (sb_string (key), "%d", i); + log_json_parser_process_single (json_object_array_get_idx (jso, i), + prefix, + sb_string (key)->str, msg); + } + break; + } + default: + msg_error ("JSON parser encountered an unknown type, skipping", + evt_tag_str ("key", obj_key), NULL); + break; + } - switch (json_object_get_type (itr.val)) + if (parsed) + { + if (prefix) { - case json_type_boolean: - parsed = TRUE; - if (json_object_get_boolean (itr.val)) - g_string_assign (sb_string (value), "true"); - else - g_string_assign (sb_string (value), "false"); - break; - case json_type_double: - parsed = TRUE; - g_string_printf (sb_string (value), "%f", - json_object_get_double (itr.val)); - break; - case json_type_int: - parsed = TRUE; - g_string_printf (sb_string (value), "%i", - json_object_get_int (itr.val)); - break; - case json_type_string: - parsed = TRUE; - g_string_assign (sb_string (value), - json_object_get_string (itr.val)); - break; - case json_type_object: g_string_assign (sb_string (key), prefix); - g_string_append (sb_string (key), itr.key); - g_string_append_c (sb_string (key), '.'); - log_json_parser_process_object (itr.val, sb_string (key)->str, msg); - break; - case json_type_array: - msg_error ("JSON parser does not support arrays yet, " - "skipping", - evt_tag_str ("key", itr.key), NULL); - break; - default: - msg_error ("JSON parser encountered an unknown type, skipping", - evt_tag_str ("key", itr.key), NULL); - break; - } - - if (parsed) - { - if (prefix) - { - g_string_assign (sb_string (key), prefix); - g_string_append (sb_string (key), itr.key); - log_msg_set_value (msg, - log_msg_get_value_handle (sb_string (key)->str), - sb_string (value)->str, sb_string (value)->len); - } - else - log_msg_set_value (msg, - log_msg_get_value_handle (itr.key), - sb_string (value)->str, sb_string (value)->len); + g_string_append (sb_string (key), obj_key); + log_msg_set_value (msg, + log_msg_get_value_handle (sb_string (key)->str), + sb_string (value)->str, sb_string (value)->len); } + else + log_msg_set_value (msg, + log_msg_get_value_handle (obj_key), + sb_string (value)->str, sb_string (value)->len); } + scratch_buffer_release (key); scratch_buffer_release (value); } +static void +log_json_parser_process_object (struct json_object *jso, + const gchar *prefix, + LogMessage *msg) +{ + struct json_object_iter itr; + + json_object_object_foreachC (jso, itr) + { + log_json_parser_process_single (itr.val, prefix, itr.key, msg); + } +} + static gboolean log_json_parser_process (LogParser *s, LogMessage *msg, const gchar *input) { LogJSONParser *self = (LogJSONParser *) s; struct json_object *jso; - struct json_object_iter itr; jso = json_tokener_parse (input); -- 1.7.7.3
On Tue, 2012-01-10 at 13:11 +0100, Gergely Nagy wrote:
Following this mail, a couple of patches will come - they were written in last November, but I don't think I posted them (as I didn't expect to see the json-parser merged so soon). Until now, they were sitting on my feature/3.4/json/parser branch, and that's where they sit now, too.
I rebased that branch onto 3.4 master + center.c syntax error fixes (which I posted a week or two ago), and am sending the json parser patches now.
They accomplish a few things: it makes the parser handle invalid JSON input properly (by returning an error, instead of crashing); adds support for parsing boolean, array and nested object types.
Booleans will get parsed into TRUE or FALSE, nested objects will be turned into dotted notation (which also means that keys with a dot in them won't be handled properly by format-json, once it is updated to support reconstructing nested objects from dotted notation), and arrays will be treated like nested objects, with the array indexes used as a key.
As an example, lets look at a JSON input:
{"is-example": true, "hats": ["top hat", "baseball cap", "pointy witch-hat"], "auth": { "method": ["looking angry at the security guy", "bribery"], "result": "did not work" } }
This would be parsed into the following name-value paris:
"is-example"="TRUE" "hats.0"="top hat" "hats.1"="baseball cap" "hats.2"="pointy witch-hat" "auth.method.0"="looking angry at the security guy" "auth.method.1"="bribery" "auth.result"="did not work"
The other part of this will be an enhahcement to value-pairs and format-json, that will make syslog-ng able to reconstruct the original JSON (more or less... booleans probably won't be supported, except perhaps optionally with a flag). But that is something I haven't started working on yet.
Merged all of this, however it'd be nice if format-json could actually handle the output of this parser. Thanks Gergely. -- Bazsi
Balazs Scheidler <bazsi@balabit.hu> writes:
Merged all of this, however it'd be nice if format-json could actually handle the output of this parser.
That is also in the pipeline, but that needs a few changes deeper in value-pairs to work efficiently. I didn't have time to do that yet, but it's definitely high on my todo list. -- |8]
participants (2)
-
Balazs Scheidler
-
Gergely Nagy