hi, we had a conversation with Algernon a couple of weeks ago how mongodb should handle value collisions which arise when both the SDATA and the .SDATA.<id>.<param> values are to be added to the document. As a reminder, mongodb strips the initial dot, and subsequent dots are used to break down values as subdocuments. This means that at the top level both SDATA as value and SDATA as a document are present. I found out the following solution, which is a change in how mongodb currently works. I'd like to do this before releasing 3.3beta. Feedback is welcome. * the SDATA macro is not included in rfc5424, selected-macros, but can be explicitly specified by key(), pair() and it's still included in all-macros * a new "sdata" scope is introduced which expands to the .SDATA.<id>.<param> values, but the SDATA is not included either. * nv-pairs is split: nv-pairs contains values which have no leading dot, dot-nv-pairs only contains the ones that have leading dots. all-nv-pairs contain both. This way, SDATA macro is only included if someone really wants it, thus collision is much less likely. Also, I'd like to replace the initial dot in mongodb to initial underscore, this way eliminating the possibility of collisions completely. I've a half baked patch, but nothing complete yet. However I'd really like to push the beta version out ASAP, and this one is keeping it back. Thanks for any comments. -- Bazsi
Balazs Scheidler <bazsi@balabit.hu> writes:
hi,
we had a conversation with Algernon a couple of weeks ago how mongodb should handle value collisions which arise when both the SDATA and the .SDATA.<id>.<param> values are to be added to the document.
As a reminder, mongodb strips the initial dot, and subsequent dots are used to break down values as subdocuments. This means that at the top level both SDATA as value and SDATA as a document are present.
I found out the following solution, which is a change in how mongodb currently works. I'd like to do this before releasing 3.3beta. Feedback is welcome.
* the SDATA macro is not included in rfc5424, selected-macros, but can be explicitly specified by key(), pair() and it's still included in all-macros * a new "sdata" scope is introduced which expands to the .SDATA.<id>.<param> values, but the SDATA is not included either. * nv-pairs is split: nv-pairs contains values which have no leading dot, dot-nv-pairs only contains the ones that have leading dots. all-nv-pairs contain both.
This way, SDATA macro is only included if someone really wants it, thus collision is much less likely.
Thus far, I'm all for it, and really like the proposed changes.
Also, I'd like to replace the initial dot in mongodb to initial underscore, this way eliminating the possibility of collisions completely.
However, this is something I don't particularly like... Stripping the dot is a very easy operation, that does not involve any extra allocation or copying: I just pass name+1 to bson_append_string() instead of name, and that's about it. Replacing the dot with an underscore would be a much more costy operation. Furthermore, when I'm storing sdata in mongodb, in a structured manner, then I explicitly want it to NOT have a leading dot or underscore: while leading-dot stuff might be classified as internal syslog-ng stuff, when I export them to a database, they're not internal anymore, and shouldn't be distinguished from any other data, in my opinion. This does mean that collisions are a bit more likely, but personally, I can live with that. Especially since the $SDATA macro is only included in the set if explicitly added. There is a possibility that other things might conflict, but the chances are a lot lower. In the long run, I think that key rewriting is the way forward (but that's 3.4 material), and if we do go that way, I don't see much point in introducing a conflicting idea in 3.3. -- |8]
On Thu, 2011-05-19 at 11:33 +0200, Gergely Nagy wrote:
Balazs Scheidler <bazsi@balabit.hu> writes:
hi,
we had a conversation with Algernon a couple of weeks ago how mongodb should handle value collisions which arise when both the SDATA and the .SDATA.<id>.<param> values are to be added to the document.
As a reminder, mongodb strips the initial dot, and subsequent dots are used to break down values as subdocuments. This means that at the top level both SDATA as value and SDATA as a document are present.
I found out the following solution, which is a change in how mongodb currently works. I'd like to do this before releasing 3.3beta. Feedback is welcome.
* the SDATA macro is not included in rfc5424, selected-macros, but can be explicitly specified by key(), pair() and it's still included in all-macros * a new "sdata" scope is introduced which expands to the .SDATA.<id>.<param> values, but the SDATA is not included either. * nv-pairs is split: nv-pairs contains values which have no leading dot, dot-nv-pairs only contains the ones that have leading dots. all-nv-pairs contain both.
This way, SDATA macro is only included if someone really wants it, thus collision is much less likely.
Thus far, I'm all for it, and really like the proposed changes.
Also, I'd like to replace the initial dot in mongodb to initial underscore, this way eliminating the possibility of collisions completely.
However, this is something I don't particularly like... Stripping the dot is a very easy operation, that does not involve any extra allocation or copying: I just pass name+1 to bson_append_string() instead of name, and that's about it.
Replacing the dot with an underscore would be a much more costy operation.
Furthermore, when I'm storing sdata in mongodb, in a structured manner, then I explicitly want it to NOT have a leading dot or underscore: while leading-dot stuff might be classified as internal syslog-ng stuff, when I export them to a database, they're not internal anymore, and shouldn't be distinguished from any other data, in my opinion.
This does mean that collisions are a bit more likely, but personally, I can live with that. Especially since the $SDATA macro is only included in the set if explicitly added. There is a possibility that other things might conflict, but the chances are a lot lower.
In the long run, I think that key rewriting is the way forward (but that's 3.4 material), and if we do go that way, I don't see much point in introducing a conflicting idea in 3.3.
I think key rewriting doesn't conflict with my idea to use '_' as the initial character in these cases. In my plans, '.' prefixed data will become more and more widespread as more and more formatted data is going to be received directly by syslog-ng. For instance, I'd like to create an SNMP receiver, where data will be added into something like '.snmp.XXXX'. Or with an SQL receiver, fields would become '.sql.table.fieldname', or something similar. However, I'd also like to make it easy to transform these '.' prefixed values to "normal" values: my original idea was, that whenever a logic is built into syslog-ng to generate value names they'll be prefixed by a dot (or have no dot at all, in the case of legacy macros). The dotless name-value pairs are reserved for the user to define: when she defines a scheme (for instance in the patterndb project, or by CEE), those values will not have the initial dot, the key difference being is that in this case the naming of name-value pairs comes from the configuration and not code. In this sense the information that a given key is blessed by user configuration (=e.g. no dot), or is one that was simply processed by syslog-ng without giving too much thought in the structure is really a difference, which I'd like to store even in MongoDB and all structured destinations. Also, in case key rewriting gets addedd to value-pairs, it'll be very easy to remove the dot prefix. Here's what I propose: * let's use the underscore for now, key rewrite functions will come soon anyway. * let's solve the performance concerns by doing the initial dot -> underscore translation in libmongo-client, which is in a better position to do that, it really doesn't make sense to use initial dots for any future libmongo-client users anyway. That way we can do it without having to copy the value name. In the way forward, I'd like to create a rewrite plugin (in the syslog-ng sense) that would transform a log message using value-pairs syntax. This way such translations would become possible _before_ a message would hit the destinations. -- Bazsi
Balazs Scheidler <bazsi@balabit.hu> writes:
In the long run, I think that key rewriting is the way forward (but that's 3.4 material), and if we do go that way, I don't see much point in introducing a conflicting idea in 3.3.
I think key rewriting doesn't conflict with my idea to use '_' as the initial character in these cases.
[..snip...] I'm convinced.
Here's what I propose: * let's use the underscore for now, key rewrite functions will come soon anyway. * let's solve the performance concerns by doing the initial dot -> underscore translation in libmongo-client, which is in a better position to do that, it really doesn't make sense to use initial dots for any future libmongo-client users anyway. That way we can do it without having to copy the value name.
We've talked about this earlier, and I spent the night brooding about this, and while it would be easy to do this in libmongo-client, I'm reluctant to do so, because it feels too much like second guessing the user. LMC is reasonably simple, and the lower level functions syslog-ng's afmongodb uses from it are like the bare metal. I really wouldn't want to add transformations like this there. On the other hand, recent libmongo-client has a higher-level interface, which does a lot of second guessing, and the .->_ transformation might go there (afmongodb will be adapted to use these higher level functions at some point, btw). But even then, I'd rather return with an error and let the library user handle it in any way they like. Therefore I think this should be done on the syslog-ng side. The performance loss wouldn't be much bigger than doing the same thing in libmongo-client: it's not cheaper to do it there (not with the current code, anyway).
In the way forward, I'd like to create a rewrite plugin (in the syslog-ng sense) that would transform a log message using value-pairs syntax. This way such translations would become possible _before_ a message would hit the destinations.
This sounds intriguing. -- |8]
Here's what I propose: * let's use the underscore for now, key rewrite functions will come soon anyway. * let's solve the performance concerns by doing the initial dot -> underscore translation in libmongo-client, which is in a better position to do that, it really doesn't make sense to use initial dots for any future libmongo-client users anyway. That way we can do it without having to copy the value name.
We've talked about this earlier, and I spent the night brooding about this, and while it would be easy to do this in libmongo-client, I'm reluctant to do so, because it feels too much like second guessing the user. LMC is reasonably simple, and the lower level functions syslog-ng's afmongodb uses from it are like the bare metal. I really wouldn't want to add transformations like this there.
Another reason not to do transformations in libmongo-client: None of the other mongodb drivers I checked do that: they either return an error (the offical C driver does this), raise an exception (python, ruby), or leave it up to the database to fail (c++, and perl, as far as I saw). Since none of the official drivers do transformation either, I wouldn't like libmongo-client to do it, either. Not by default anyway. I can be convinced to do .->_ mapping when 'safe mode' is enabled. (But that assumes that afmongodb will be updated to use the higher level API provided by LMC, which supports this safe mode flag) -- |8]
On Sun, 2011-05-22 at 00:20 +0200, Gergely Nagy wrote:
Here's what I propose: * let's use the underscore for now, key rewrite functions will come soon anyway. * let's solve the performance concerns by doing the initial dot -> underscore translation in libmongo-client, which is in a better position to do that, it really doesn't make sense to use initial dots for any future libmongo-client users anyway. That way we can do it without having to copy the value name.
We've talked about this earlier, and I spent the night brooding about this, and while it would be easy to do this in libmongo-client, I'm reluctant to do so, because it feels too much like second guessing the user. LMC is reasonably simple, and the lower level functions syslog-ng's afmongodb uses from it are like the bare metal. I really wouldn't want to add transformations like this there.
Another reason not to do transformations in libmongo-client: None of the other mongodb drivers I checked do that: they either return an error (the offical C driver does this), raise an exception (python, ruby), or leave it up to the database to fail (c++, and perl, as far as I saw).
Since none of the official drivers do transformation either, I wouldn't like libmongo-client to do it, either. Not by default anyway. I can be convinced to do .->_ mapping when 'safe mode' is enabled. (But that assumes that afmongodb will be updated to use the higher level API provided by LMC, which supports this safe mode flag)
Ok, agreed, I'll then remove my patches from libmongo-client and change the assumption in syslog-ng on how libmongo-client works, and then release 3.3beta1. -- Bazsi
Balazs Scheidler <bazsi@balabit.hu> writes:
Ok, agreed, I'll then remove my patches from libmongo-client and change the assumption in syslog-ng on how libmongo-client works, and then release 3.3beta1.
Thanks, and sorry for being a bit indecisive earlier O:) By the way, before you release beta1: I've sent a patch earlier, which fixed mongodb's value-pairs() support. One which you supposedly applied, but I couldn't find it in your git tree. It was something like this: commit 39aa0f7ae0dbf099034c8384291c91688b262bfa Author: Gergely Nagy <algernon@balabit.hu> Date: Sun May 8 10:19:19 2011 +0200 afmongodb: Fix a possible crash during configuration. During configuration, if the user specified value-pairs(), the driver would crash, because it tried to free a NULL structure. Signed-off-by: Gergely Nagy <algernon@balabit.hu> diff --git a/modules/afmongodb/afmongodb.c b/modules/afmongodb/afmongodb.c index 68594d7..422058e 100644 --- a/modules/afmongodb/afmongodb.c +++ b/modules/afmongodb/afmongodb.c @@ -148,7 +148,8 @@ afmongodb_dd_set_value_pairs(LogDriver *d, ValuePairs *vp) { MongoDBDestDriver *self = (MongoDBDestDriver *)d; - value_pairs_free (self->vp); + if (self->vp) + value_pairs_free (self->vp); self->vp = vp; } Either this is needed, or value_pairs_free() needs to bail out early if vp is NULL. -- |8]
Gergely Nagy <algernon@balabit.hu> writes:
Balazs Scheidler <bazsi@balabit.hu> writes:
Ok, agreed, I'll then remove my patches from libmongo-client and change the assumption in syslog-ng on how libmongo-client works, and then release 3.3beta1.
Thanks, and sorry for being a bit indecisive earlier O:)
By the way, before you release beta1: I've sent a patch earlier, which fixed mongodb's value-pairs() support. One which you supposedly applied, but I couldn't find it in your git tree.
Nevermind, just received your other mail, and now I see it in your git tree. -- |8]
participants (2)
- 
                
                Balazs Scheidler
- 
                
                Gergely Nagy