[PATCH (3.4)] afmongodb: Add support for datetime fields
For TTL collections primarly (but useful for other scenarios aswell), introduce a new option: datetime(FIELD_NAME). This allows one to mark a single field anywhere in the structure as a datetime field, and the driver will attempt to convert it to a proper datetime, by assuming it is a unix timestamp. If the conversion fails, or the field is not found, it will not be inserted. Usage: mongodb(datetime("some.date") value-pairs( key("MESSAGE") pair("some.date", "$UNIXTIME"))) This will result in something like: { "_id" : ObjectId("509b8ffc1a1006213a000001"), "some" : { "date" : ISODate("2012-11-08T10:56:59Z") }, "MESSAGE" : "foobar" } Signed-off-by: Gergely Nagy <algernon@balabit.hu> --- modules/afmongodb/afmongodb-grammar.ym | 2 ++ modules/afmongodb/afmongodb-parser.c | 1 + modules/afmongodb/afmongodb.c | 61 +++++++++++++++++++++++++++++--- modules/afmongodb/afmongodb.h | 1 + 4 files changed, 61 insertions(+), 4 deletions(-) diff --git a/modules/afmongodb/afmongodb-grammar.ym b/modules/afmongodb/afmongodb-grammar.ym index f108f26..40eccc0 100644 --- a/modules/afmongodb/afmongodb-grammar.ym +++ b/modules/afmongodb/afmongodb-grammar.ym @@ -53,6 +53,7 @@ extern ValuePairsTransformSet *last_vp_transset; %token KW_SERVERS %token KW_SAFE_MODE %token KW_PATH +%token KW_DATETIME %% @@ -97,6 +98,7 @@ afmongodb_option | KW_USERNAME '(' string ')' { afmongodb_dd_set_user(last_driver, $3); free($3); } | KW_PASSWORD '(' string ')' { afmongodb_dd_set_password(last_driver, $3); free($3); } | KW_SAFE_MODE '(' yesno ')' { afmongodb_dd_set_safe_mode(last_driver, $3); } + | KW_DATETIME '(' string ')' { afmongodb_dd_set_datetime(last_driver, $3); free($3); } | value_pair_option { afmongodb_dd_set_value_pairs(last_driver, $1); } | dest_driver_option ; diff --git a/modules/afmongodb/afmongodb-parser.c b/modules/afmongodb/afmongodb-parser.c index 8aa55c1..1c51154 100644 --- a/modules/afmongodb/afmongodb-parser.c +++ b/modules/afmongodb/afmongodb-parser.c @@ -39,6 +39,7 @@ static CfgLexerKeyword afmongodb_keywords[] = { { "host", KW_HOST }, { "port", KW_PORT }, { "path", KW_PATH }, + { "datetime", KW_DATETIME }, { NULL } }; diff --git a/modules/afmongodb/afmongodb.c b/modules/afmongodb/afmongodb.c index 205ab98..bb267a5 100644 --- a/modules/afmongodb/afmongodb.c +++ b/modules/afmongodb/afmongodb.c @@ -22,6 +22,7 @@ */ #include <time.h> +#include <stdlib.h> #include "afmongodb.h" #include "afmongodb-parser.h" @@ -32,6 +33,7 @@ #include "nvtable.h" #include "logqueue.h" #include "value-pairs.h" +#include "scratch-buffers.h" #include "mongo.h" @@ -55,6 +57,7 @@ typedef struct gint port; gboolean safe_mode; + gchar *datetime_field; gchar *user; gchar *password; @@ -207,6 +210,15 @@ afmongodb_dd_set_safe_mode(LogDriver *d, gboolean state) self->safe_mode = state; } +void +afmongodb_dd_set_datetime(LogDriver *d, const gchar *field) +{ + MongoDBDestDriver *self = (MongoDBDestDriver *)d; + + g_free(self->datetime_field); + self->datetime_field = g_strdup(field); +} + /* * Utilities */ @@ -343,7 +355,7 @@ afmongodb_vp_obj_end(const gchar *name, if (prev_data) root = (bson *)*prev_data; else - root = (bson *)user_data; + root = (bson *)(((gpointer *)user_data)[0]); if (prefix_data) { @@ -357,6 +369,30 @@ afmongodb_vp_obj_end(const gchar *name, } static gboolean +afmongodb_field_is_datetime(const gchar *prefix, const gchar *name, + const gchar *datetime_field) +{ + size_t plen = 0; + + if (!datetime_field) + return FALSE; + + if (prefix) + plen = strlen(prefix); + + if (strncmp(datetime_field, prefix, plen) != 0) + return FALSE; + + if (datetime_field[plen] != '.') + return FALSE; + + if (strcmp(&datetime_field[plen + 1], name) != 0) + return FALSE; + + return TRUE; +} + +static gboolean afmongodb_vp_process_value(const gchar *name, const gchar *prefix, const gchar *value, gpointer *prefix_data, gpointer user_data) @@ -366,7 +402,7 @@ afmongodb_vp_process_value(const gchar *name, const gchar *prefix, if (prefix_data) o = (bson *)*prefix_data; else - o = (bson *)user_data; + o = (bson *)(((gpointer *)user_data)[0]); if (name[0] == '.') { @@ -378,7 +414,20 @@ afmongodb_vp_process_value(const gchar *name, const gchar *prefix, bson_append_string (o, tx_name, value, -1); } else - bson_append_string (o, name, value, -1); + { + MongoDBDestDriver *self = (MongoDBDestDriver *)(((gpointer *)user_data)[1]); + + if (afmongodb_field_is_datetime(prefix, name, self->datetime_field)) + { + gchar *endptr; + gint64 datetime = (gint64)strtoul(value, &endptr, 10) * 1000; + + if (endptr[0] == '\0') + bson_append_utc_datetime(o, name, datetime); + } + else + bson_append_string (o, name, value, -1); + } return FALSE; } @@ -390,6 +439,7 @@ afmongodb_worker_insert (MongoDBDestDriver *self) guint8 *oid; LogMessage *msg; LogPathOptions path_options = LOG_PATH_OPTIONS_INIT; + gpointer user_data[] = { NULL, self }; afmongodb_dd_connect(self, TRUE); @@ -408,11 +458,13 @@ afmongodb_worker_insert (MongoDBDestDriver *self) bson_append_oid (self->bson, "_id", oid); g_free (oid); + user_data[0] = self->bson; + value_pairs_walk(self->vp, afmongodb_vp_obj_start, afmongodb_vp_process_value, afmongodb_vp_obj_end, - msg, self->seq_num, self->bson); + msg, self->seq_num, user_data); bson_finish (self->bson); if (!mongo_sync_cmd_insert_n(self->conn, self->ns, 1, @@ -650,6 +702,7 @@ afmongodb_dd_free(LogPipe *d) string_list_free(self->servers); if (self->vp) value_pairs_free(self->vp); + g_free(self->datetime_field); log_dest_driver_free(d); } diff --git a/modules/afmongodb/afmongodb.h b/modules/afmongodb/afmongodb.h index 6a61616..76f8b83 100644 --- a/modules/afmongodb/afmongodb.h +++ b/modules/afmongodb/afmongodb.h @@ -39,6 +39,7 @@ void afmongodb_dd_set_password(LogDriver *d, const gchar *password); void afmongodb_dd_set_value_pairs(LogDriver *d, ValuePairs *vp); void afmongodb_dd_set_safe_mode(LogDriver *d, gboolean state); void afmongodb_dd_set_path(LogDriver *d, const gchar *path); +void afmongodb_dd_set_datetime(LogDriver *d, const gchar *field); gboolean afmongodb_dd_check_address(LogDriver *d, gboolean local); -- 1.7.10.4
Hi, Although I'd integrate this patch in this form, I have a different idea on how this and similar stuff should be implemented. In the sql destination, we need a similar functionality: the ability to specify that the template being used is not an SQL string (and quoted accordingly), but rather an SQL literal to be inserted into the query. This could be used something like this in SQL: destination d_sql { sql(columns('id', 'now datetime', 'bar'), values("$ID", literal("now()"), "$BAR")); }; This would generalize how the handling of DEFAULT values, which looks like this: destination d_sql { sql(columns('id', 'now datetime', 'bar'), values("$ID", default, "$BAR")); }; Here, 'default' means to use the default for the column. If value-pairs() could grow such a functionality, that would be great. I'd probably call this function "casting", casting the output of the template to some destination specific type or function. This could look like this in mongodb: mongodb( value-pairs( key("MESSAGE") pair("some.date", datetime("$UNIXTIME")))) "datetime" here would only be passed around by value-pairs to mongodb, which could interpret various casting methods. Whether it'd make sense This is how I'd begin doing it: diff --git a/lib/cfg-grammar.y b/lib/cfg-grammar.y index ba6d4f1..1d3b0f9 100644 --- a/lib/cfg-grammar.y +++ b/lib/cfg-grammar.y @@ -1006,13 +1006,7 @@ dest_writer_option | KW_FLUSH_LINES '(' LL_NUMBER ')' { last_writer_options->flush_lines = $3; } | KW_FLUSH_TIMEOUT '(' LL_NUMBER ')' { last_writer_options->flush_timeout = $3; } | KW_SUPPRESS '(' LL_NUMBER ')' { last_writer_options->suppress = $3; } - | KW_TEMPLATE '(' string ')' { - GError *error = NULL; - - last_writer_options->template = cfg_tree_check_inline_template(&configuration->tr - CHECK_ERROR(last_writer_options->template != NULL, @3, "Error compiling template - free($3); - } + | template_opt | KW_TEMPLATE_ESCAPE '(' yesno ')' { log_writer_options_set_template_escape(last_writer_options, $3); } | KW_TIME_ZONE '(' string ')' { last_writer_options->template_options.time_zone[LTZ_SEND] = g_strdup($3); free($3) | KW_TS_FORMAT '(' string ')' { last_writer_options->template_options.ts_format = cfg_ts_format_value($3); free($3 @@ -1028,6 +1022,15 @@ dest_writer_option } ; +template_opt + : KW_TEMPLATE '(' template_content ')' + ; + +template_content + : string { /* simple template */ } + | LL_IDENTIFIER '(' string ')' { /* cast, store $1 together with the template to be interpreted by the caller */ + ; + dest_writer_options_flags : string dest_writer_options_flags { $$ = log_writer_options_lookup_flag($1) | $2; free($1); } | { $$ = 0; } Then hook everything that parses templates to use either template_opt or template_content, not many call sites: $ find . -name \*.y -o -name \*.ym | xargs grep log_template_compile ./lib/filter-expr-grammar.ym: success_left = log_template_compile(left, $1, &error); ./lib/filter-expr-grammar.ym: success_right = log_template_compile(right, $3, &error); ./lib/cfg-grammar.y: CHECK_ERROR(log_template_compile(last_template, $3, &error), @ Then once the cast is stored together with LogTemplate, the users could use that to insert default values, or merely supply type information to a string formatted by templates. What do you think? Do you want me to merge this patch anyway? On Thu, 2012-11-08 at 12:11 +0100, Gergely Nagy wrote:
For TTL collections primarly (but useful for other scenarios aswell), introduce a new option: datetime(FIELD_NAME). This allows one to mark a single field anywhere in the structure as a datetime field, and the driver will attempt to convert it to a proper datetime, by assuming it is a unix timestamp.
If the conversion fails, or the field is not found, it will not be inserted.
Usage:
mongodb(datetime("some.date") value-pairs( key("MESSAGE") pair("some.date", "$UNIXTIME")))
This will result in something like:
{ "_id" : ObjectId("509b8ffc1a1006213a000001"), "some" : { "date" : ISODate("2012-11-08T10:56:59Z") }, "MESSAGE" : "foobar" }
Signed-off-by: Gergely Nagy <algernon@balabit.hu> --- modules/afmongodb/afmongodb-grammar.ym | 2 ++ modules/afmongodb/afmongodb-parser.c | 1 + modules/afmongodb/afmongodb.c | 61 +++++++++++++++++++++++++++++--- modules/afmongodb/afmongodb.h | 1 + 4 files changed, 61 insertions(+), 4 deletions(-)
diff --git a/modules/afmongodb/afmongodb-grammar.ym b/modules/afmongodb/afmongodb-grammar.ym index f108f26..40eccc0 100644 --- a/modules/afmongodb/afmongodb-grammar.ym +++ b/modules/afmongodb/afmongodb-grammar.ym @@ -53,6 +53,7 @@ extern ValuePairsTransformSet *last_vp_transset; %token KW_SERVERS %token KW_SAFE_MODE %token KW_PATH +%token KW_DATETIME
%%
@@ -97,6 +98,7 @@ afmongodb_option | KW_USERNAME '(' string ')' { afmongodb_dd_set_user(last_driver, $3); free($3); } | KW_PASSWORD '(' string ')' { afmongodb_dd_set_password(last_driver, $3); free($3); } | KW_SAFE_MODE '(' yesno ')' { afmongodb_dd_set_safe_mode(last_driver, $3); } + | KW_DATETIME '(' string ')' { afmongodb_dd_set_datetime(last_driver, $3); free($3); } | value_pair_option { afmongodb_dd_set_value_pairs(last_driver, $1); } | dest_driver_option ; diff --git a/modules/afmongodb/afmongodb-parser.c b/modules/afmongodb/afmongodb-parser.c index 8aa55c1..1c51154 100644 --- a/modules/afmongodb/afmongodb-parser.c +++ b/modules/afmongodb/afmongodb-parser.c @@ -39,6 +39,7 @@ static CfgLexerKeyword afmongodb_keywords[] = { { "host", KW_HOST }, { "port", KW_PORT }, { "path", KW_PATH }, + { "datetime", KW_DATETIME }, { NULL } };
diff --git a/modules/afmongodb/afmongodb.c b/modules/afmongodb/afmongodb.c index 205ab98..bb267a5 100644 --- a/modules/afmongodb/afmongodb.c +++ b/modules/afmongodb/afmongodb.c @@ -22,6 +22,7 @@ */
#include <time.h> +#include <stdlib.h>
#include "afmongodb.h" #include "afmongodb-parser.h" @@ -32,6 +33,7 @@ #include "nvtable.h" #include "logqueue.h" #include "value-pairs.h" +#include "scratch-buffers.h"
#include "mongo.h"
@@ -55,6 +57,7 @@ typedef struct gint port;
gboolean safe_mode; + gchar *datetime_field;
gchar *user; gchar *password; @@ -207,6 +210,15 @@ afmongodb_dd_set_safe_mode(LogDriver *d, gboolean state) self->safe_mode = state; }
+void +afmongodb_dd_set_datetime(LogDriver *d, const gchar *field) +{ + MongoDBDestDriver *self = (MongoDBDestDriver *)d; + + g_free(self->datetime_field); + self->datetime_field = g_strdup(field); +} + /* * Utilities */ @@ -343,7 +355,7 @@ afmongodb_vp_obj_end(const gchar *name, if (prev_data) root = (bson *)*prev_data; else - root = (bson *)user_data; + root = (bson *)(((gpointer *)user_data)[0]);
if (prefix_data) { @@ -357,6 +369,30 @@ afmongodb_vp_obj_end(const gchar *name, }
static gboolean +afmongodb_field_is_datetime(const gchar *prefix, const gchar *name, + const gchar *datetime_field) +{ + size_t plen = 0; + + if (!datetime_field) + return FALSE; + + if (prefix) + plen = strlen(prefix); + + if (strncmp(datetime_field, prefix, plen) != 0) + return FALSE; + + if (datetime_field[plen] != '.') + return FALSE; + + if (strcmp(&datetime_field[plen + 1], name) != 0) + return FALSE; + + return TRUE; +} + +static gboolean afmongodb_vp_process_value(const gchar *name, const gchar *prefix, const gchar *value, gpointer *prefix_data, gpointer user_data) @@ -366,7 +402,7 @@ afmongodb_vp_process_value(const gchar *name, const gchar *prefix, if (prefix_data) o = (bson *)*prefix_data; else - o = (bson *)user_data; + o = (bson *)(((gpointer *)user_data)[0]);
if (name[0] == '.') { @@ -378,7 +414,20 @@ afmongodb_vp_process_value(const gchar *name, const gchar *prefix, bson_append_string (o, tx_name, value, -1); } else - bson_append_string (o, name, value, -1); + { + MongoDBDestDriver *self = (MongoDBDestDriver *)(((gpointer *)user_data)[1]); + + if (afmongodb_field_is_datetime(prefix, name, self->datetime_field)) + { + gchar *endptr; + gint64 datetime = (gint64)strtoul(value, &endptr, 10) * 1000; + + if (endptr[0] == '\0') + bson_append_utc_datetime(o, name, datetime); + } + else + bson_append_string (o, name, value, -1); + }
return FALSE; } @@ -390,6 +439,7 @@ afmongodb_worker_insert (MongoDBDestDriver *self) guint8 *oid; LogMessage *msg; LogPathOptions path_options = LOG_PATH_OPTIONS_INIT; + gpointer user_data[] = { NULL, self };
afmongodb_dd_connect(self, TRUE);
@@ -408,11 +458,13 @@ afmongodb_worker_insert (MongoDBDestDriver *self) bson_append_oid (self->bson, "_id", oid); g_free (oid);
+ user_data[0] = self->bson; + value_pairs_walk(self->vp, afmongodb_vp_obj_start, afmongodb_vp_process_value, afmongodb_vp_obj_end, - msg, self->seq_num, self->bson); + msg, self->seq_num, user_data); bson_finish (self->bson);
if (!mongo_sync_cmd_insert_n(self->conn, self->ns, 1, @@ -650,6 +702,7 @@ afmongodb_dd_free(LogPipe *d) string_list_free(self->servers); if (self->vp) value_pairs_free(self->vp); + g_free(self->datetime_field); log_dest_driver_free(d); }
diff --git a/modules/afmongodb/afmongodb.h b/modules/afmongodb/afmongodb.h index 6a61616..76f8b83 100644 --- a/modules/afmongodb/afmongodb.h +++ b/modules/afmongodb/afmongodb.h @@ -39,6 +39,7 @@ void afmongodb_dd_set_password(LogDriver *d, const gchar *password); void afmongodb_dd_set_value_pairs(LogDriver *d, ValuePairs *vp); void afmongodb_dd_set_safe_mode(LogDriver *d, gboolean state); void afmongodb_dd_set_path(LogDriver *d, const gchar *path); +void afmongodb_dd_set_datetime(LogDriver *d, const gchar *field);
gboolean afmongodb_dd_check_address(LogDriver *d, gboolean local);
-- Bazsi
Balazs Scheidler <bazsi@balabit.hu> writes:
Although I'd integrate this patch in this form, I have a different idea on how this and similar stuff should be implemented. [...] This would generalize how the handling of DEFAULT values, which looks like this:
destination d_sql {
sql(columns('id', 'now datetime', 'bar'), values("$ID", default, "$BAR")); };
Here, 'default' means to use the default for the column. If value-pairs() could grow such a functionality, that would be great.
I'm not entirely sure how and where DEFAULT would fit into value-pairs()...
I'd probably call this function "casting", casting the output of the template to some destination specific type or function.
This could look like this in mongodb:
mongodb( value-pairs( key("MESSAGE") pair("some.date", datetime("$UNIXTIME"))))
That's an option yeah, and not even too hard to make it happen, and it would even be friendlier to future enhancements (such as types stored in nvtable). This would also make the implementation simpler, the way I did the type check in this patch isn't exactly nice.
+template_content + : string { /* simple template */ } + | LL_IDENTIFIER '(' string ')' { /* cast, store $1 together with the template to be interpreted by the caller */ + ; +
Mmmmhm. Neat. One thing that this is missing though, is a way to verify that the specified type is available, and efficient storage for it: I wouldn't really want to strcmp() it against N things during a value_pairs_foreach(), nor do I wish to defer validating the cast until then. Having a global list of types would perhaps be useful, even if it's not as flexible as it could be. (We'll need something like that for storing types in nvtable later anyway, as far as I see)
Then once the cast is stored together with LogTemplate, the users could use that to insert default values, or merely supply type information to a string formatted by templates.
What do you think?
I like it!
Do you want me to merge this patch anyway?
Nope, don't, I'll remove it from merge-queue/3.4, and redo it based on the above suggestions. -- |8]
Gergely Nagy <algernon@balabit.hu> writes:
destination d_sql {
sql(columns('id', 'now datetime', 'bar'), values("$ID", default, "$BAR")); };
Here, 'default' means to use the default for the column. If value-pairs() could grow such a functionality, that would be great.
I'm not entirely sure how and where DEFAULT would fit into value-pairs()...
I was thinking on the way to work, and translating DEFAULT to value-pairs could be like: value-pairs(pair("some.date", default())) This would end up with VP doing pretty much nothing, and leaving the handling of default up to the callback. That is, default would be just another "type", and the callback would be called with an empty value, and type set to VALUE_PAIR_TYPE_DEFAULT (or something like that). Then the driver could figure out what to do based on the name.
I'd probably call this function "casting", casting the output of the template to some destination specific type or function.
This could look like this in mongodb:
mongodb( value-pairs( key("MESSAGE") pair("some.date", datetime("$UNIXTIME"))))
That's an option yeah, and not even too hard to make it happen, and it would even be friendlier to future enhancements (such as types stored in nvtable). This would also make the implementation simpler, the way I did the type check in this patch isn't exactly nice.
I already have a PoC written in my head, which boilds down to adding a new 'metadata' argument to the value-pairs callbacks. The reason I don't do the conversion within vp itself, is because drivers may interpret the various types differently (eg, datetime is different for SQL and MongoDB). Armed with this metadata (which, for now, will only contain type information), the drivers can Do The Right Thing. VP could provide typecast functions for the common types (integers, in particular). So, the mongodb value-pairs callback would become this: static gboolean afmongodb_vp_process_value(const gchar *name, const gchar *prefix, const gchar *value, const value_pairs_meta_t *meta, gpointer *prefix_data, gpointer user_data) { bson *o; if (prefix_data) o = (bson *)*prefix_data; else o = (bson *)(((gpointer *)user_data)[0]); if (name[0] == '.') { gchar tx_name[256]; tx_name[0] = '_'; strncpy(&tx_name[1], name + 1, sizeof(tx_name) - 1); tx_name[sizeof(tx_name) - 1] = 0; bson_append_string (o, tx_name, value, -1); } else { switch (meta->type) { case DATA_TYPE_INT32: { gint32 n; if (value_pairs_typecast_int32(value, &n)) bson_append_int32(o, name, n); break; } [...] case DATA_TYPE_STRING: bson_append_string(o, name, value, -1); break; } } return FALSE; } This looks reasonably nice to me, and having the type in a generic metadata argument allows us to add further meta-data to it later, if so need be. (I already have a few ideas where this would be useful, but more on that later)
+template_content + : string { /* simple template */ } + | LL_IDENTIFIER '(' string ')' { /* cast, store $1 together with the template to be interpreted by the caller */ + ; +
Mmmmhm. Neat. One thing that this is missing though, is a way to verify that the specified type is available, and efficient storage for it: I wouldn't really want to strcmp() it against N things during a value_pairs_foreach(), nor do I wish to defer validating the cast until then.
Having a global list of types would perhaps be useful, even if it's not as flexible as it could be. (We'll need something like that for storing types in nvtable later anyway, as far as I see)
One thing to note here, though, is that while templates could hold typecast data, we'll need some type stuff for nvtable too, at some point. In the long run, I want to be able to teach the JSON parser to also record the data type, and pass it along, so that tcp-source -> json-parser -> format-json would automatically preserve as much type information as possible. Or when doing syslog()->format-json, we'd have PID as a number by default. -- |8]
Balazs Scheidler <bazsi@balabit.hu> writes:
destination d_sql {
sql(columns('id', 'now datetime', 'bar'), values("$ID", default, "$BAR")); };
Here, 'default' means to use the default for the column. If value-pairs() could grow such a functionality, that would be great.
This is more or less done, although default handling is a bit awkward at the moment (it needs default("")), and I none of the destinations support it yet.
mongodb( value-pairs( key("MESSAGE") pair("some.date", datetime("$UNIXTIME"))))
"datetime" here would only be passed around by value-pairs to mongodb, which could interpret various casting methods.
And that's done, too: we recognise int32() (also as int()), int64(), string() (the default, if omitted), datetime() and boolean(). If so need be, uint32, and uint64 are easy to add, that covers most things. I already modified the mongodb destination to support these, and it works wonderfully. I'll also update the amqp destination too in the next few hours. The hard part is figuring out how it should all work for $(format-json), because that does not use the bison grammar. Most likely, I'll implement a tiny parser for that case. I'll post the patchset to the list once I'm done with everything. -- |8]
participants (2)
-
Balazs Scheidler
-
Gergely Nagy