Re: [syslog-ng] value-pairs and sdata

20 May 2011

      On Thu, 2011-05-19 at 11:33 +0200, Gergely Nagy wrote:
...
Balazs Scheidler <bazsi@balabit.hu> writes:
...
hi,
we had a conversation with Algernon a couple of weeks ago how mongodb
should handle value collisions which arise when both the SDATA and the
.SDATA.<id>.<param> values are to be added to the document.
As a reminder, mongodb strips the initial dot, and subsequent dots are
used to break down values as subdocuments. This means that at the top
level both SDATA as value and SDATA as a document are present.
I found out the following solution, which is a change in how mongodb currently works. I'd like to do this before releasing 3.3beta. Feedback is welcome.
* the SDATA macro is not included in rfc5424, selected-macros, but can be explicitly specified by key(), pair() and it's still included in all-macros
 * a new "sdata" scope is introduced which expands to the .SDATA.<id>.<param> values, but the SDATA is not included either.
 * nv-pairs is split: nv-pairs contains values which have no leading dot, dot-nv-pairs only contains the ones that have leading dots. all-nv-pairs contain both.
This way, SDATA macro is only included if someone really wants it,
thus collision is much less likely.
Thus far, I'm all for it, and really like the proposed changes.
...
Also, I'd like to replace the initial dot in mongodb to initial
underscore, this way eliminating the possibility of collisions
completely.
However, this is something I don't particularly like... Stripping the
dot is a very easy operation, that does not involve any extra allocation
or copying: I just pass name+1 to bson_append_string() instead of name,
and that's about it.
Replacing the dot with an underscore would be a much more costy
operation.
Furthermore, when I'm storing sdata in mongodb, in a structured manner,
then I explicitly want it to NOT have a leading dot or underscore: while
leading-dot stuff might be classified as internal syslog-ng stuff, when
I export them to a database, they're not internal anymore, and shouldn't
be distinguished from any other data, in my opinion.
This does mean that collisions are a bit more likely, but personally, I
can live with that. Especially since the $SDATA macro is only included
in the set if explicitly added. There is a possibility that other things
might conflict, but the chances are a lot lower.
In the long run, I think that key rewriting is the way forward (but
that's 3.4 material), and if we do go that way, I don't see much point
in introducing a conflicting idea in 3.3.
I think key rewriting doesn't conflict with my idea to use '_' as the
initial character in these cases. In my plans, '.' prefixed data will
become more and more widespread as more and more formatted data is going
to be received directly by syslog-ng. For instance, I'd like to create
an SNMP receiver, where data will be added into something like
'.snmp.XXXX'. Or with an SQL receiver, fields would become
'.sql.table.fieldname', or something similar.

However, I'd also like to make it easy to transform these '.' prefixed
values to "normal" values: my original idea was, that whenever a logic
is built into syslog-ng to generate value names they'll be prefixed by a
dot (or have no dot at all, in the case of legacy macros). 

The dotless name-value pairs are reserved for the user to define: when
she defines a scheme (for instance in the patterndb project, or by CEE),
those values will not have the initial dot, the key difference being is
that in this case the naming of name-value pairs comes from the
configuration and not code.

In this sense the information that a given key is blessed by user
configuration (=e.g. no dot), or is one that was simply processed by
syslog-ng without giving too much thought in the structure is really a
difference, which I'd like to store even in MongoDB and all structured
destinations.

Also, in case key rewriting gets addedd to value-pairs, it'll be very
easy to remove the dot prefix.

Here's what I propose:
  * let's use the underscore for now, key rewrite functions will come
soon anyway.
  * let's solve the performance concerns by doing the initial dot ->
underscore translation in libmongo-client, which is in a better position
to do that, it really doesn't make sense to use initial dots for any
future libmongo-client users anyway. That way we can do it without
having to copy the value name.

In the way forward, I'd like to create a rewrite plugin (in the
syslog-ng sense) that would transform a log message using value-pairs
syntax. This way such translations would become possible _before_ a
message would hit the destinations.

-- 
Bazsi