On Thu, 2011-05-19 at 11:33 +0200, Gergely Nagy wrote:
Balazs Scheidler <bazsi@balabit.hu> writes:
hi,
we had a conversation with Algernon a couple of weeks ago how mongodb should handle value collisions which arise when both the SDATA and the .SDATA.<id>.<param> values are to be added to the document.
As a reminder, mongodb strips the initial dot, and subsequent dots are used to break down values as subdocuments. This means that at the top level both SDATA as value and SDATA as a document are present.
I found out the following solution, which is a change in how mongodb currently works. I'd like to do this before releasing 3.3beta. Feedback is welcome.
* the SDATA macro is not included in rfc5424, selected-macros, but can be explicitly specified by key(), pair() and it's still included in all-macros * a new "sdata" scope is introduced which expands to the .SDATA.<id>.<param> values, but the SDATA is not included either. * nv-pairs is split: nv-pairs contains values which have no leading dot, dot-nv-pairs only contains the ones that have leading dots. all-nv-pairs contain both.
This way, SDATA macro is only included if someone really wants it, thus collision is much less likely.
Thus far, I'm all for it, and really like the proposed changes.
Also, I'd like to replace the initial dot in mongodb to initial underscore, this way eliminating the possibility of collisions completely.
However, this is something I don't particularly like... Stripping the dot is a very easy operation, that does not involve any extra allocation or copying: I just pass name+1 to bson_append_string() instead of name, and that's about it.
Replacing the dot with an underscore would be a much more costy operation.
Furthermore, when I'm storing sdata in mongodb, in a structured manner, then I explicitly want it to NOT have a leading dot or underscore: while leading-dot stuff might be classified as internal syslog-ng stuff, when I export them to a database, they're not internal anymore, and shouldn't be distinguished from any other data, in my opinion.
This does mean that collisions are a bit more likely, but personally, I can live with that. Especially since the $SDATA macro is only included in the set if explicitly added. There is a possibility that other things might conflict, but the chances are a lot lower.
In the long run, I think that key rewriting is the way forward (but that's 3.4 material), and if we do go that way, I don't see much point in introducing a conflicting idea in 3.3.
I think key rewriting doesn't conflict with my idea to use '_' as the initial character in these cases. In my plans, '.' prefixed data will become more and more widespread as more and more formatted data is going to be received directly by syslog-ng. For instance, I'd like to create an SNMP receiver, where data will be added into something like '.snmp.XXXX'. Or with an SQL receiver, fields would become '.sql.table.fieldname', or something similar. However, I'd also like to make it easy to transform these '.' prefixed values to "normal" values: my original idea was, that whenever a logic is built into syslog-ng to generate value names they'll be prefixed by a dot (or have no dot at all, in the case of legacy macros). The dotless name-value pairs are reserved for the user to define: when she defines a scheme (for instance in the patterndb project, or by CEE), those values will not have the initial dot, the key difference being is that in this case the naming of name-value pairs comes from the configuration and not code. In this sense the information that a given key is blessed by user configuration (=e.g. no dot), or is one that was simply processed by syslog-ng without giving too much thought in the structure is really a difference, which I'd like to store even in MongoDB and all structured destinations. Also, in case key rewriting gets addedd to value-pairs, it'll be very easy to remove the dot prefix. Here's what I propose: * let's use the underscore for now, key rewrite functions will come soon anyway. * let's solve the performance concerns by doing the initial dot -> underscore translation in libmongo-client, which is in a better position to do that, it really doesn't make sense to use initial dots for any future libmongo-client users anyway. That way we can do it without having to copy the value name. In the way forward, I'd like to create a rewrite plugin (in the syslog-ng sense) that would transform a log message using value-pairs syntax. This way such translations would become possible _before_ a message would hit the destinations. -- Bazsi