Logging all message metadata
As I work with the classification engine, I wish there were a way for me to log *all* metadata associated with a log message. That is, I would like to record any data parsed out of the message by the parsing engine, as well as all the default metadata that syslog-ng generates about a message. Is there any way to do this other than writing my own output driver? As far as I can tell, all of the available drivers -- including the sql() driver -- require me to explicitly list which attributes I want to log. While my ultimate target would be sending this data into a database, I would be happy if I were able to dump it all to a file in some sort of structured format that I could parse with my own tools (or maybe even feed back into syslog-ng). Thanks, -- Lars
This is a bit of a problem if you want to get it into a traditional RDBMS as you have to know the columns ahead of time. There are a couple of ways you can handle this: - use a NoSQL database like MongoDB (which is awesome) - use a traditional RMDBS but run your output through a (you guessed it) Perl script which will format it into blobs in XML or JSON to get a SQL/NoSQL hybrid - record the "real" column names in a separate DB table and use aliases for the patterns (this is what I do now). For instance, I have a set table with six integer columns and six string columns, and extract the names i0-i5 and s0-s5 so they can go directly to the DB. When it comes time to query, I use the class_id to dictate what the context of "i1" means for a given row. So i1 could be an IP address for class 1 and an event ID for class 2. The point is that I don't need to worry about altering the DB schema for each class type. I have just one destination driver with one template, and it logs the macros i0-s5 whether they are present or not, which is fine, because they'll just go in as nulls in the DB. On Mon, Oct 25, 2010 at 2:58 PM, Lars Kellogg-Stedman <lars@oddbit.com> wrote:
As I work with the classification engine, I wish there were a way for me to log *all* metadata associated with a log message. That is, I would like to record any data parsed out of the message by the parsing engine, as well as all the default metadata that syslog-ng generates about a message. Is there any way to do this other than writing my own output driver? As far as I can tell, all of the available drivers -- including the sql() driver -- require me to explicitly list which attributes I want to log.
While my ultimate target would be sending this data into a database, I would be happy if I were able to dump it all to a file in some sort of structured format that I could parse with my own tools (or maybe even feed back into syslog-ng).
Thanks,
-- Lars ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
There are a couple of ways you can handle this:
These are all useful suggestions, but I'm still stuck with the root of the problem -- I don't know how to get "all the metadata" associated with a message using any of the existing output drivers. Anything using templates I need to explicitly define the content of the message, and the sql() driver, as you point out, also requires explicitly selecting metadata. Neither of these allow me access to any and all information generated by the parsing engine -- which may change periodically as I updated the pattern database.
I don't think you understood the third option, which does do that, though only for a finite number of fields. If you use generic names for your extractions "@NUMBER:i0:@ @NUMBER:i1:@ @ESTRING:s0:%@ etc. then your single template works for any message: template("$R_UNIXTIME\t$SOURCEIP\t$PROGRAM\t${.classifier.class}\t${.classifier.rule_id}\t$MSGONLY\t${i0}\t${i1}\t${i2}\t${i3}\t${i4}\t${i5}\t${s0}\t${s1}\t${s2}\t${s3}\t${s4}\t${s5}\n"); As long as no pattern extraction uses a name other than i0-s5, you're good to go. On Mon, Oct 25, 2010 at 10:32 PM, Lars Kellogg-Stedman <lars@oddbit.com> wrote:
There are a couple of ways you can handle this:
These are all useful suggestions, but I'm still stuck with the root of the problem -- I don't know how to get "all the metadata" associated with a message using any of the existing output drivers. Anything using templates I need to explicitly define the content of the message, and the sql() driver, as you point out, also requires explicitly selecting metadata.
Neither of these allow me access to any and all information generated by the parsing engine -- which may change periodically as I updated the pattern database. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
though only for a finite number of fields. If you use generic names for your extractions "@NUMBER:i0:@ @NUMBER:i1:@ @ESTRING:s0:%@ etc. then your single template works for any message:
I understood the suggestion. This is the point I'm trying to make: If I'm using, for example, the community patterndb database, then the metadata includes named values (e.g., "flowevt.src_ip") that I may not be aware of in advance. Furthermore, the values associated with a given class may change as the pattern database changes over time. This will inherently break any sort of positional schema. I am looking for way to extract all of the metadata names and values known to syslog-ng at the time the message is logged. I'm not wedded to a database solution; if I could generate a structured output format like XML or JSON I could obviously post-process in whatever fashion best suited my needs. I'm currently poking around the source to see if I can figure out how to do this.
Ok, I understand. I just figured I'd write a script to munge the community patterns into my format when that time arises. I hope you find a better solution. On Mon, Oct 25, 2010 at 10:51 PM, Lars Kellogg-Stedman <lars@oddbit.com> wrote:
though only for a finite number of fields. If you use generic names for your extractions "@NUMBER:i0:@ @NUMBER:i1:@ @ESTRING:s0:%@ etc. then your single template works for any message:
I understood the suggestion.
This is the point I'm trying to make: If I'm using, for example, the community patterndb database, then the metadata includes named values (e.g., "flowevt.src_ip") that I may not be aware of in advance. Furthermore, the values associated with a given class may change as the pattern database changes over time. This will inherently break any sort of positional schema.
I am looking for way to extract all of the metadata names and values known to syslog-ng at the time the message is logged. I'm not wedded to a database solution; if I could generate a structured output format like XML or JSON I could obviously post-process in whatever fashion best suited my needs.
I'm currently poking around the source to see if I can figure out how to do this. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Mon, 2010-10-25 at 23:51 -0400, Lars Kellogg-Stedman wrote:
though only for a finite number of fields. If you use generic names for your extractions "@NUMBER:i0:@ @NUMBER:i1:@ @ESTRING:s0:%@ etc. then your single template works for any message:
I understood the suggestion.
This is the point I'm trying to make: If I'm using, for example, the community patterndb database, then the metadata includes named values (e.g., "flowevt.src_ip") that I may not be aware of in advance. Furthermore, the values associated with a given class may change as the pattern database changes over time. This will inherently break any sort of positional schema.
I am looking for way to extract all of the metadata names and values known to syslog-ng at the time the message is logged. I'm not wedded to a database solution; if I could generate a structured output format like XML or JSON I could obviously post-process in whatever fashion best suited my needs.
I'm currently poking around the source to see if I can figure out how to do this.
This is exactly the way to go forward. The functionality you are looking for is "template functions". What I had in mind was to write a set of "format-xxx" functions, where xxx would denote a common logging format. One of these logging format I know is "WELF", which is simply: name1=value1 name2=value2 and so on. If "value" contains a space, it is enclosed in quotes. If it contains spaces and quotes, I'm not sure what happens. I haven't found an appropriate documentation on that. But anyway, what we could write is a format-welf template function which would be capable of writing out a set of NV pairs or the complete list, customizable by parameters. Since a template function is quite similar to a UNIX shell expansion (it uses the bash syntax for that), it is possible to use "command-line arguments" to specify what you would like to do: syntax: $(format-welf [options] name[=value-expr]...) Behaviour: The format-welf function emits a set of name-value pairs according to the WebTrends Enhanced Log Format. In order to specify which name-value pairs are written please specify them explicitly on the command line, or use the --select command line option. Command line options: --select <glob expression> Specify which name-value pairs are included in the result. The parameter is a shell-like glob pattern. --all Equivalent to '--select *' --prefix <string> All name-value pairs should be emitted with a prefix. --ltrim <string> Remove <string> from the beginning of the name-value pair _before_ adding the prefix. --skip-builtin Don't include built-in name-value pairs (e.g. the ones in the syslog header) Arguments: The list of arguments after the options can specify which name-value pairs are to be included. The expected format for each argument is name[=value-expr], where "name" specifies the WELF identifier for the field, and the optional value-expr is a quote-enclosed syslog-ng template string. In case the value part is missing, the "name" will be used as the name of the syslog-ng name-value pair. The quotes are only included in the result if the content of the nv pair would cause the WELF format to be ambigous. Examples: $(format-welf foo bar) Becomes: foo=FOO bar=BAR assuming foo contains the value FOO, and bar contains BAR. $(format-welf time="$YEAR-$MONTH-$DAY $HOUR:$MIN:$SEC" src="$SOURCEIP") Becomes: time="2010-10-28 21:05:53" src=1.2.3.4 $(format-welf --select .SDATA.meta.* --prefix meta --ltrim ".SDATA.meta") Becomes: meta.sequenceId=5 meta.tzKnown=1 What do you think? Would you like to implement such a functionality? I'd love to include that in the convertfuncs module in 3.2 -- Bazsi
What do you think? Would you like to implement such a functionality? I'd love to include that in the convertfuncs module in 3.2
I'd love to give it a shot. Let me take a look at the source with template functions in mind and see where I end up.
On Thu, 2010-10-28 at 15:10 -0400, Lars Kellogg-Stedman wrote:
What do you think? Would you like to implement such a functionality? I'd love to include that in the convertfuncs module in 3.2
I'd love to give it a shot. Let me take a look at the source with template functions in mind and see where I end up.
Great. You should possibly look at the simple ones (echo, ipv4-to-int, etc) first. Then you'll possibly need a a "prepare" callback in order to process the quite complicated argument list using GLib's option parser into structure that can then be passed to the core of the format functionality. The point of the prepare callback is to avoid having to do expensive parameter processing during runtime (when it is expensive) and do it early on during template compilation. Currently grep/if use this functionality to compile the filter expression into its internal representation. -- Bazsi
participants (3)
-
Balazs Scheidler
-
Lars Kellogg-Stedman
-
Martin Holste