[RFC] value-pairs(), take #3
Hi! Based on the feedback from this list, we've had a little discussion with Bazsi on how to improve value-pairs(), and we came up with something that is hopefully more consistent and easier to use than my last proposal. The Syntax ========== We'd have two syntaxes, one for the configuration file itself (usable by the drivers), and one for template functions (eg, tfjson): they'll share most properties, the difference will be in how they appear. See the example below: config file: ------------ value-pairs( scope(nv_pairs core syslog all_macros selected_macros everything) exclude("R_*") exclude("S_*") key(".SDATA.meta.sequenceId") pair("MSGHDR" "$PROGRAM[$PID]: ") ) template function: ------------------ $(format-json --scope nv_pairs,core,syslog,all_macros,selected_macros,everything \ --exclude R_* --exclude S_* --key .SDATA.meta.sequenceId \ --pair MSGHDR="$PROGRAM[$PID]: ") Explanation ----------- The above examples would start with a full set of name-value pairs (due to having "everything" in the scope; we could start with selected_macros instead [see below]). The scope can only be extended by subsequent calls to scope(), but even then, the set will be built only once, at the beginning. We'll likely end up with throwing a syntax error during parse if more than one scope() statement is seen, or if it's not the first statement within value-pairs(). However, explicitly specifying a key-value pair (either via key() or pairs()) will use the full set, regardless what scope() was selected. This, however, might change, if people find this too confusing. But changing this will complicate the code quite a lot, and remove some of the flexibility.
From this set, we exclude every pair where the key begins with "R_" or "S_", then we explicitly include .SDATA.meta.sequenceId (though, in this example, this is useless, as it's already included due to the scope, and wasn't excluded). Then add a custom key-value pair.
Syntax Details -------------- The starting name-value pair set will be defined by the scope() statement, which can have the following values: * nv_pairs: The name-value pair database, including some frequently used builtins (currently: HOST, HOST_FROM, MESSAGE, PROGRAM, PID, MSGID, SOURCE and LEGACY_MSGHDR) * rfc3164, alias core, alias base: The basic pairs from RFC3164: $FACILITY, $SEVERITY (= $LEVEL), $DATE(=$S_DATE), $HOST, $PROGRAM, $PID, and $MSG. * rfc5424, alias syslog: The pairs from rfc3164 plus $SDATA and $MSGID. * all_macros: All macros known to syslog-ng (including all of the above, pretty much) * selected_macros: rfc5424 + $TAGS, $SOURCEIP, $SEQNUM * everything: all of the above, combined Each key is added to the set only once, naturally. scope() was introduced as a replacement for builtins(), which was unclear and inflexible. scope() does the job far better, and is - in my opinion - a lot clearer too. Apart from scope(), we have a few more statements: * select() / exclude(): We wanted to rename select() to include(), but syslog-ng already has an include() statement, and I ran into problems during the rename. It's undecided whether we'll remain with select() or adjust the parser to treat the two include statements differently (I'd opt for select()). The difference between the previous implementation's select()/exclude() is that in the new implementation, the first match will matter. This gets rid of the confusing priority stuff, and is still flexible enough (especially with the introduction of scope()) for all cases we could come up with. * key(): One can list macros with this statement. It does the same thing "$HOST" and friends did in the previous implementation, one just needs to use a statement this time, for clarity's sake. * pairs(): Same thing as the previous implementation's ("key" "value") construct. Current shortcomings: * List separation: at the moment, list values need to be space separated, and the key-value pairs (see pairs()) need a space separator too. In the long run, we'd like to allow commas as separators too. Another example --------------- value-pairs( scope(selected_macros nv_pairs) select(.*) select("usracct.*") select("secevt.*") select(".SDATA.*") exclude("*") key("SEVERITY") key("HOST") key("PROGRAM") key("PID") key("MSG") key("TAGS") pair("timestamp" "$UNIXTIME") ); This will start with a base set of selected_macros and nv_pairs, select a few specified patterns, and exclude everything else. Then it will explicitly add a few keys (which does not need to be part of the original set!), and a custom key-value pair. I hope this was understandable, and better than the previous proposal. As soon as I start working on implementing this proposal, the code will be available from the work/value-pairs/base branch of my git tree: git://git.madhouse-project.org/syslog-ng/syslog-ng-3.3.git (or browsable on the web at: http://git.madhouse-project.org/syslog-ng/syslog-ng-3.3/log/?h=work/value-pa...) And as always, Your feedback is most appreciated! Nothing is set in stone yet, and I'd love to hear your opinion. -- |8]
value-pairs( scope(selected_macros nv_pairs) select(.*) select("usracct.*") select("secevt.*") select(".SDATA.*") exclude("*") key("SEVERITY") key("HOST") key("PROGRAM") key("PID") key("MSG") key("TAGS") pair("timestamp" "$UNIXTIME") );
I think I've realized why I have so much trouble with the meaning of this stanza. I think that you are approaching this as a filter of the keys. When doing this the first filter that "matches" the key is the one that actually determines if the key is included or not. I approach this as a set theory specification. In set theory, it is the last item that determines if a key is included. Both are equally flexible and non-ambiguous. My preference for this type of task is to use set theory. I view this as building a set of keys to place into the output template. I find the following a lot more intuitive. value-pairs( scope(selected_macros nv_pairs) exclude("*") select("secevt.*") select("usracct.*") select(.*) key("SEVERITY") key("HOST") key("PROGRAM") key("PID") key("MSG") key("TAGS") pair("timestamp" "$UNIXTIME") ); select(".SDATA.*") isn't needed because it matches the select(.*) anyway. This would mean - exclude everything, then add back in the secevt.* and usracct.* and .* This method is even more obvious when you match subgroups value-pairs( scope(selected_macros nv_pairs) exclude("*") select("secevt.*") select("usracct.*") exclude("usr.acct.*.something") select(.*) key("SEVERITY") key("HOST") key("PROGRAM") key("PID") key("MSG") key("TAGS") pair("timestamp" "$UNIXTIME") ); If you opt for the filter approach then the docs will have to be clear in stating that the select and include are final filters. selects or excludes following will have no affect. -- Evan
On Mon, Feb 07, 2011 at 08:54:46AM -0800, Evan Rempel wrote:
I think that you are approaching this as a filter of the keys. When doing this the first filter that "matches" the key is the one that actually determines if the key is included or not.
I approach this as a set theory specification. In set theory, it is the last item that determines if a key is included.
Both are equally flexible and non-ambiguous. My preference for this type of task is to use set theory. I view this as building a set of keys to place into the output template.
I think it was done that way for performance reasons. If you are trying to process thousands of messages per second, you want to use a rulechain, and have the key rules as high as possible up the chain as you can manage. Just like setting up ACL chains in a router. Matthew.
Matthew Hall wrote:
On Mon, Feb 07, 2011 at 08:54:46AM -0800, Evan Rempel wrote:
I think that you are approaching this as a filter of the keys. When doing this the first filter that "matches" the key is the one that actually determines if the key is included or not.
I approach this as a set theory specification. In set theory, it is the last item that determines if a key is included.
Both are equally flexible and non-ambiguous. My preference for this type of task is to use set theory. I view this as building a set of keys to place into the output template.
I think it was done that way for performance reasons.
If you are trying to process thousands of messages per second, you want to use a rulechain, and have the key rules as high as possible up the chain as you can manage.
Just like setting up ACL chains in a router.
The performance turns out to be the same because with set theory, you just process the list in the opposite order with the first match short circuiting the search. -- Evan
On Mon, 2011-02-07 at 08:54 -0800, Evan Rempel wrote:
value-pairs( scope(selected_macros nv_pairs) select(.*) select("usracct.*") select("secevt.*") select(".SDATA.*") exclude("*") key("SEVERITY") key("HOST") key("PROGRAM") key("PID") key("MSG") key("TAGS") pair("timestamp" "$UNIXTIME") );
I think I've realized why I have so much trouble with the meaning of this stanza.
I think that you are approaching this as a filter of the keys. When doing this the first filter that "matches" the key is the one that actually determines if the key is included or not.
Yep, that is the intent.
I approach this as a set theory specification. In set theory, it is the last item that determines if a key is included.
Both are equally flexible and non-ambiguous. My preference for this type of task is to use set theory. I view this as building a set of keys to place into the output template.
I find the following a lot more intuitive.
value-pairs( scope(selected_macros nv_pairs) exclude("*") select("secevt.*") select("usracct.*") select(.*) key("SEVERITY") key("HOST") key("PROGRAM") key("PID") key("MSG") key("TAGS") pair("timestamp" "$UNIXTIME") );
select(".SDATA.*") isn't needed because it matches the select(.*) anyway.
I see your point, and it makes sense. From an implementation point of view - as you point out in a later mail - it doesn't matter much, I'd just have to start at the other end of the list. The more I think about it, the more I like your way of thinking.
If you opt for the filter approach then the docs will have to be clear in stating that the select and include are final filters. selects or excludes following will have no affect.
If we end up with filter approach, it will be properly documented, yep. On the flip side, if this is the only complaint against the current proposal, that's great news, as I can go ahead and implement it! Switching between the two approaches (from a code point of view) is trivial, and can be decided later. -- |8]
participants (3)
-
Evan Rempel
-
Gergely Nagy
-
Matthew Hall