[syslog-ng] pattern matching on xxx#

Bill Anderson Bill.Anderson at bodybuilding.com
Mon Oct 18 19:48:17 CEST 2010


On Oct 15, 2010, at 10:37 PM, Balazs Scheidler wrote:

> On Fri, 2010-10-15 at 12:48 -0600, Bill Anderson wrote:
>> I have hostnames of the format xxxx# such as host1, hostb1, hostc1. I need to split that into two fields such as (host,1).
>> 
>> Unfortunately, since @@ escapes the @ and STRING and it's followers ALSO match digits, I've not found the obvious e.
>> means to get that out. Conceptually something like @LETTER:host.name@@NUMBER:host.id@ woudl do it, save that 
>> LETTER doesn't exist and @@ escapes.
> 
> '@@' wouldn't escape in this situation.

Good to know. :)

> The thing I'd like to understand before recommending a solution is where
> these hostnames come in the picture? usually the hostname portion is not
> processed by db-parser. Or you have these names inside the message
> payload and you want to get it from there?
> 
> what I would propose if this is the case is to use a regexp _after_ you
> parsed the hostname, and only on the hostname field.

Ran into a few hiccups (curses for dev environments not matching production ones), but have it working (at least to a 99% level).

First hiccup was that we wound up needing to use the CNAME rather than the A record for some hosts. That was remedied through Puppet templating. Using an include and a ruby template we set a template that includes the CNAME as part of the format of the message (for a tab separated apache access log). This is done on the clients. Then on the CLS I use  the csv-parser to break that into fields, including one called APACHE.ROLEHOST. I can then rewrite that field from name1 to name-1. Then the idea was to do a further csv-parser on that field to split into clustername and hostid.

But that doesn't seem to work. when I reference the column names from the second parser, I get an empty value (or my specified default). So if I rewrite to just the clustername (essentially stripping out the digits), I can get the end goal. Unfortunately, that loses the hostid portion. This satisfies the criteria nicely with the exception of losing the hostid.

Since that may not sound very clear, here are the config portions to explain it:

parser p_apache1 {
    csv-parser( columns("APACHE.NATE", "APACHE.TIMESTAMP", "APACHE.CNAME", "APACHE.ROLEHOST", "APACHE.VHOST", "APACHE.REMOTEIP", "APACHE.METHOD", "APACHE.AUTHUSER", "APACHE.STATUS", "APACHE.SIZE", "APACHE.URL", "APACHE.QUERY", "APACHE.REFERRER", "APACHE.USERAGENT") delimiters ("\t")
    );
};
parser p_clustername {
        csv-parser(
                columns("APACHE.ROLEHOST.CLUSTER,'APACHE.ROLEHOST.ID")
                delimiters("-")
                flags(escape-none)
                template("${APACHE.ROLEHOST}")
        );
};
rewrite r_rewrite_subst_apache_split{subst('(?P<cluster>[a-z]+)(?P<hostid>[0-9]+)', '$cluster-$hostid', value('APACHE.ROLEHOST') type('pcre') );};


log { source(s_remote);  
    parser(p_apache1); 
    rewrite(r_rewrite_subst_apache_split); 
    parser(p_clustername); 
    destination(df_apacheaccess2);
};
destination df_apacheaccess2 { file("/var/log/$YEAR-$MONTH-$DAY/apache-urls.log" template("${APACHE.ROLEHOST.CLUSTER:-wtf}\t${APACHE.ROLEHOST}\t${APACHE.URL}\t${APACHE.TIMESTAMP}\n") template_escape(no)); };

[note: this is a stripped down log file just to test the changes. my eyes find it easier not having a ton of fields I don't care about. ;) ]
The requirement to have to do this to either the hostname or the rolehost as set in the template does complicate things, I acknowledge.


So this brings up a question:
I am essentially (and in part literally) replicating Example 3.36 through 3.38 from the admin guide, with one change: I am doing a rewrite in between parser "calls". Is it possible/expected that the second call is not getting the value change from the rewrite? I know the rewrite is working as in the logfile I get name-id for ${APACHE.ROLEHOST}; this happens after the parser and rewrites.

Further, is it possible to set a regex-matched value to the value of a "new" field using set? I see where I can set fields manually. I am assuming that is not possible.


Perhaps doing the rewrite then using a patterndb entry? I'll go try that.

I'd like to request @LETTER (or something as descriptive) for patterndb parsers to only match letters as that makes this case trivial with patterndb (which I'd ultimately prefer anyhow). Is there a feature request page I should visit to do so? :)


Cheers and thanks,
Bill








More information about the syslog-ng mailing list