On Fri, 2010-10-15 at 12:48 -0600, Bill Anderson wrote:
I have hostnames of the format xxxx# such as host1, hostb1, hostc1. I need to split that into two fields such as (host,1).
Unfortunately, since @@ escapes the @ and STRING and it's followers ALSO match digits, I've not found the obvious e. means to get that out. Conceptually something like @LETTER:host.name@@NUMBER:host.id@ woudl do it, save that LETTER doesn't exist and @@ escapes.
'@@' wouldn't escape in this situation. The thing I'd like to understand before recommending a solution is where these hostnames come in the picture? usually the hostname portion is not processed by db-parser. Or you have these names inside the message payload and you want to get it from there? what I would propose if this is the case is to use a regexp _after_ you parsed the hostname, and only on the hostname field. e.g. in patterndb you only parse the hostname and put the result in a ${hostname} name-value pair. e.g. parser p_pdb { db-parser(); }; filter f_cluster_member { match("^([a-z]+)([0-9]+)$" value('hostname') flags(store-matches)); }; if using pcre you could also parse groups right into name-value pairs with named groups (from man pcresyntax): (?<name>...) named capturing group (Perl) (?'name'...) named capturing group (Perl) (?P<name>...) named capturing group (Python) Also, it'd make sense to create a regexp parser which doesn't currently exist, because you only have that functionality with a filter, and if you don't want to filter out non-matching log messages, then you'll have to use some nasty hackery, e.g: parser p_pdb { db-parser(); }; filter f_cluster_member { match("^([a-z]+)([0-9]+)$" value('hostname') flags(store-matches)) or match('.'); };
The end goal is as follows (pseudo-code): I need to have a destination for each (HOST). For example all files from hosta## go to /var/log/hosta/ and entries for hostb## go to /var/log/hostb/
Ahh, so it seems you don't want to parse out hostnames from the message payload, but rather you'd like to use the $HOST name-value pair. Then, definitely the regexp is the way to go and you don't need db-parser() at all.
I suppose I *might* be able to do a rewrite to add say, a hyphen, and then use csv-parser, but we're talking some heavy traffic and I suspect that doing rewrites on that much traffic would be a performance killer.
I'm open to suggestions (that don't involve changing server names, preferably ;) ) as to how to accomplish this.
If regexp really becomes a performance bottleneck a parser plugin would probably be much faster. but that requires the 3.2 codebase. -- Bazsi