I have hostnames of the format xxxx# such as host1, hostb1, hostc1. I need to split that into two fields such as (host,1). Unfortunately, since @@ escapes the @ and STRING and it's followers ALSO match digits, I've not found the obvious means to get that out. Conceptually something like @LETTER:host.name@@NUMBER:host.id@ woudl do it, save that LETTER doesn't exist and @@ escapes. The end goal is as follows (pseudo-code): I need to have a destination for each (HOST). For example all files from hosta## go to /var/log/hosta/ and entries for hostb## go to /var/log/hostb/ I suppose I *might* be able to do a rewrite to add say, a hyphen, and then use csv-parser, but we're talking some heavy traffic and I suspect that doing rewrites on that much traffic would be a performance killer. I'm open to suggestions (that don't involve changing server names, preferably ;) ) as to how to accomplish this. Cheers, Bill
On Fri, Oct 15, 2010 at 12:48:14PM -0600, Bill Anderson wrote:
I have hostnames of the format xxxx# such as host1, hostb1, hostc1. I need to split that into two fields such as (host,1).
Unfortunately, since @@ escapes the @ and STRING and it's followers ALSO match digits, I've not found the obvious means to get that out. Conceptually something like @LETTER:host.name@@NUMBER:host.id@ woudl do it, save that LETTER doesn't exist and @@ escapes.
I think you can get around @@ escapes by adding extra @'s. ;-) Too bad LETTER doesn't exist yet. Man I wish it did for some of the @#$%^&* @#$% I have to parse.
The end goal is as follows (pseudo-code): I need to have a destination for each (HOST). For example all files from hosta## go to /var/log/hosta/ and entries for hostb## go to /var/log/hostb/
Goal makes sense for a big server farm. Crazy idea. Depending how your IP subnets are set up... could you break the host IPs into pieces using '.' and direct the logs where they need to go using the IP?
I suppose I *might* be able to do a rewrite to add say, a hyphen, and then use csv-parser, but we're talking some heavy traffic and I suspect that doing rewrites on that much traffic would be a performance killer.
Can you try the rewrite on a second syslog-ng receiving a relayed copy of the traffic using AF_UNIX SOCK_DGRAM also known as unix-dgram driver? That way if it has disastrous side effects you could find out without causing outages in your primary syslog-ng. I often use this sort of approach for testing crazy ideas.
I'm open to suggestions (that don't involve changing server names, preferably ;) ) as to how to accomplish this.
Let's keep working on it until we come up with a good idea. There has to be some way to make it happen.
Cheers, Bill
Regards, Matthew Hall.
On Oct 15, 2010, at 1:01 PM, Matthew Hall wrote:
On Fri, Oct 15, 2010 at 12:48:14PM -0600, Bill Anderson wrote:
I have hostnames of the format xxxx# such as host1, hostb1, hostc1. I need to split that into two fields such as (host,1).
Unfortunately, since @@ escapes the @ and STRING and it's followers ALSO match digits, I've not found the obvious means to get that out. Conceptually something like @LETTER:host.name@@NUMBER:host.id@ woudl do it, save that LETTER doesn't exist and @@ escapes.
I think you can get around @@ escapes by adding extra @'s. ;-) Too bad LETTER doesn't exist yet. Man I wish it did for some of the @#$%^&* @#$% I have to parse.
Hmm if @@@ worked, and LETTER existed, that *would* solve it.
The end goal is as follows (pseudo-code): I need to have a destination for each (HOST). For example all files from hosta## go to /var/log/hosta/ and entries for hostb## go to /var/log/hostb/
Goal makes sense for a big server farm. Crazy idea. Depending how your IP subnets are set up... could you break the host IPs into pieces using '.' and direct the logs where they need to go using the IP?
Hmm an interesting idea. Not sure, but will look into it.
Can you try the rewrite on a second syslog-ng receiving a relayed copy of the traffic using AF_UNIX SOCK_DGRAM also known as unix-dgram driver? That way if it has disastrous side effects you could find out without causing outages in your primary syslog-ng. I often use this sort of approach for testing crazy ideas.
Yeah I've abused the daylights out of some of my syslog-ng installs using things like this. Even to the point of having a destination be a network socket that did some conversion to binary that I then shipped back into SNG which then wrote that to files. Saved me from writing the code to manage the files (and let me store them on a different server). ;) Hmm, perhaps the rewrite would be performance-safe if done by the SNG clients as opposed to the Central Log Servers (CLS).
I'm open to suggestions (that don't involve changing server names, preferably ;) ) as to how to accomplish this.
Let's keep working on it until we come up with a good idea. There has to be some way to make it happen.
I'm sure there is a way, rest assured it *will* be found. ;) Now, if I could set variables in the conf file to be used in templates and filters ... ;) Cheers, Bill
I'll chime in here to once again recommending piping to Perl using program() if you have crazy stuff to do. In your case, you could have a very simple (one liner, really) script that does the regex hostname rewrite so that hostXX would get rewritten to just XX or something easy for syslog-ng to filter on and route to the appropriate destination. Just have a socket source available as the destination from Perl and a source in syslog-ng to complete the circuit. On Fri, Oct 15, 2010 at 2:13 PM, Bill Anderson <Bill.Anderson@bodybuilding.com> wrote:
On Oct 15, 2010, at 1:01 PM, Matthew Hall wrote:
On Fri, Oct 15, 2010 at 12:48:14PM -0600, Bill Anderson wrote:
I have hostnames of the format xxxx# such as host1, hostb1, hostc1. I need to split that into two fields such as (host,1).
Unfortunately, since @@ escapes the @ and STRING and it's followers ALSO match digits, I've not found the obvious means to get that out. Conceptually something like @LETTER:host.name@@NUMBER:host.id@ woudl do it, save that LETTER doesn't exist and @@ escapes.
I think you can get around @@ escapes by adding extra @'s. ;-) Too bad LETTER doesn't exist yet. Man I wish it did for some of the @#$%^&* @#$% I have to parse.
Hmm if @@@ worked, and LETTER existed, that *would* solve it.
The end goal is as follows (pseudo-code): I need to have a destination for each (HOST). For example all files from hosta## go to /var/log/hosta/ and entries for hostb## go to /var/log/hostb/
Goal makes sense for a big server farm. Crazy idea. Depending how your IP subnets are set up... could you break the host IPs into pieces using '.' and direct the logs where they need to go using the IP?
Hmm an interesting idea. Not sure, but will look into it.
Can you try the rewrite on a second syslog-ng receiving a relayed copy of the traffic using AF_UNIX SOCK_DGRAM also known as unix-dgram driver? That way if it has disastrous side effects you could find out without causing outages in your primary syslog-ng. I often use this sort of approach for testing crazy ideas.
Yeah I've abused the daylights out of some of my syslog-ng installs using things like this. Even to the point of having a destination be a network socket that did some conversion to binary that I then shipped back into SNG which then wrote that to files. Saved me from writing the code to manage the files (and let me store them on a different server). ;) Hmm, perhaps the rewrite would be performance-safe if done by the SNG clients as opposed to the Central Log Servers (CLS).
I'm open to suggestions (that don't involve changing server names, preferably ;) ) as to how to accomplish this.
Let's keep working on it until we come up with a good idea. There has to be some way to make it happen.
I'm sure there is a way, rest assured it *will* be found. ;)
Now, if I could set variables in the conf file to be used in templates and filters ... ;)
Cheers, Bill ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Fri, 2010-10-15 at 14:43 -0500, Martin Holste wrote:
I'll chime in here to once again recommending piping to Perl using program() if you have crazy stuff to do. In your case, you could have a very simple (one liner, really) script that does the regex hostname rewrite so that hostXX would get rewritten to just XX or something easy for syslog-ng to filter on and route to the appropriate destination. Just have a socket source available as the destination from Perl and a source in syslog-ng to complete the circuit.
syslog-ng itself is able to do regexp transformations, it is just hidden under "filter" currently. you don't need to pipe out perl and back again. -- Bazsi
Certainly! It's not an optimal solution, but the one big benefit you get is that the regexp happens in a different PID, so syslog-ng, in its current single-threaded model, doesn't have to burn resources doing the parsing. This is, of course, assuming that the parsing would be a greater overhead than the pipe overhead, which may or may not be true. Unless you're seeing high CPU utilization on syslog-ng, I totally agree with you and recommend keeping everything in Syslog-NG if at all possible. On Fri, Oct 15, 2010 at 11:39 PM, Balazs Scheidler <bazsi@balabit.hu> wrote:
On Fri, 2010-10-15 at 14:43 -0500, Martin Holste wrote:
I'll chime in here to once again recommending piping to Perl using program() if you have crazy stuff to do. In your case, you could have a very simple (one liner, really) script that does the regex hostname rewrite so that hostXX would get rewritten to just XX or something easy for syslog-ng to filter on and route to the appropriate destination. Just have a socket source available as the destination from Perl and a source in syslog-ng to complete the circuit.
syslog-ng itself is able to do regexp transformations, it is just hidden under "filter" currently. you don't need to pipe out perl and back again.
-- Bazsi
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Fri, 2010-10-15 at 12:48 -0600, Bill Anderson wrote:
I have hostnames of the format xxxx# such as host1, hostb1, hostc1. I need to split that into two fields such as (host,1).
Unfortunately, since @@ escapes the @ and STRING and it's followers ALSO match digits, I've not found the obvious e. means to get that out. Conceptually something like @LETTER:host.name@@NUMBER:host.id@ woudl do it, save that LETTER doesn't exist and @@ escapes.
'@@' wouldn't escape in this situation. The thing I'd like to understand before recommending a solution is where these hostnames come in the picture? usually the hostname portion is not processed by db-parser. Or you have these names inside the message payload and you want to get it from there? what I would propose if this is the case is to use a regexp _after_ you parsed the hostname, and only on the hostname field. e.g. in patterndb you only parse the hostname and put the result in a ${hostname} name-value pair. e.g. parser p_pdb { db-parser(); }; filter f_cluster_member { match("^([a-z]+)([0-9]+)$" value('hostname') flags(store-matches)); }; if using pcre you could also parse groups right into name-value pairs with named groups (from man pcresyntax): (?<name>...) named capturing group (Perl) (?'name'...) named capturing group (Perl) (?P<name>...) named capturing group (Python) Also, it'd make sense to create a regexp parser which doesn't currently exist, because you only have that functionality with a filter, and if you don't want to filter out non-matching log messages, then you'll have to use some nasty hackery, e.g: parser p_pdb { db-parser(); }; filter f_cluster_member { match("^([a-z]+)([0-9]+)$" value('hostname') flags(store-matches)) or match('.'); };
The end goal is as follows (pseudo-code): I need to have a destination for each (HOST). For example all files from hosta## go to /var/log/hosta/ and entries for hostb## go to /var/log/hostb/
Ahh, so it seems you don't want to parse out hostnames from the message payload, but rather you'd like to use the $HOST name-value pair. Then, definitely the regexp is the way to go and you don't need db-parser() at all.
I suppose I *might* be able to do a rewrite to add say, a hyphen, and then use csv-parser, but we're talking some heavy traffic and I suspect that doing rewrites on that much traffic would be a performance killer.
I'm open to suggestions (that don't involve changing server names, preferably ;) ) as to how to accomplish this.
If regexp really becomes a performance bottleneck a parser plugin would probably be much faster. but that requires the 3.2 codebase. -- Bazsi
On Oct 15, 2010, at 10:37 PM, Balazs Scheidler wrote:
On Fri, 2010-10-15 at 12:48 -0600, Bill Anderson wrote:
I have hostnames of the format xxxx# such as host1, hostb1, hostc1. I need to split that into two fields such as (host,1).
Unfortunately, since @@ escapes the @ and STRING and it's followers ALSO match digits, I've not found the obvious e. means to get that out. Conceptually something like @LETTER:host.name@@NUMBER:host.id@ woudl do it, save that LETTER doesn't exist and @@ escapes.
'@@' wouldn't escape in this situation.
Good to know. :)
The thing I'd like to understand before recommending a solution is where these hostnames come in the picture? usually the hostname portion is not processed by db-parser. Or you have these names inside the message payload and you want to get it from there?
what I would propose if this is the case is to use a regexp _after_ you parsed the hostname, and only on the hostname field.
Ran into a few hiccups (curses for dev environments not matching production ones), but have it working (at least to a 99% level). First hiccup was that we wound up needing to use the CNAME rather than the A record for some hosts. That was remedied through Puppet templating. Using an include and a ruby template we set a template that includes the CNAME as part of the format of the message (for a tab separated apache access log). This is done on the clients. Then on the CLS I use the csv-parser to break that into fields, including one called APACHE.ROLEHOST. I can then rewrite that field from name1 to name-1. Then the idea was to do a further csv-parser on that field to split into clustername and hostid. But that doesn't seem to work. when I reference the column names from the second parser, I get an empty value (or my specified default). So if I rewrite to just the clustername (essentially stripping out the digits), I can get the end goal. Unfortunately, that loses the hostid portion. This satisfies the criteria nicely with the exception of losing the hostid. Since that may not sound very clear, here are the config portions to explain it: parser p_apache1 { csv-parser( columns("APACHE.NATE", "APACHE.TIMESTAMP", "APACHE.CNAME", "APACHE.ROLEHOST", "APACHE.VHOST", "APACHE.REMOTEIP", "APACHE.METHOD", "APACHE.AUTHUSER", "APACHE.STATUS", "APACHE.SIZE", "APACHE.URL", "APACHE.QUERY", "APACHE.REFERRER", "APACHE.USERAGENT") delimiters ("\t") ); }; parser p_clustername { csv-parser( columns("APACHE.ROLEHOST.CLUSTER,'APACHE.ROLEHOST.ID") delimiters("-") flags(escape-none) template("${APACHE.ROLEHOST}") ); }; rewrite r_rewrite_subst_apache_split{subst('(?P<cluster>[a-z]+)(?P<hostid>[0-9]+)', '$cluster-$hostid', value('APACHE.ROLEHOST') type('pcre') );}; log { source(s_remote); parser(p_apache1); rewrite(r_rewrite_subst_apache_split); parser(p_clustername); destination(df_apacheaccess2); }; destination df_apacheaccess2 { file("/var/log/$YEAR-$MONTH-$DAY/apache-urls.log" template("${APACHE.ROLEHOST.CLUSTER:-wtf}\t${APACHE.ROLEHOST}\t${APACHE.URL}\t${APACHE.TIMESTAMP}\n") template_escape(no)); }; [note: this is a stripped down log file just to test the changes. my eyes find it easier not having a ton of fields I don't care about. ;) ] The requirement to have to do this to either the hostname or the rolehost as set in the template does complicate things, I acknowledge. So this brings up a question: I am essentially (and in part literally) replicating Example 3.36 through 3.38 from the admin guide, with one change: I am doing a rewrite in between parser "calls". Is it possible/expected that the second call is not getting the value change from the rewrite? I know the rewrite is working as in the logfile I get name-id for ${APACHE.ROLEHOST}; this happens after the parser and rewrites. Further, is it possible to set a regex-matched value to the value of a "new" field using set? I see where I can set fields manually. I am assuming that is not possible. Perhaps doing the rewrite then using a patterndb entry? I'll go try that. I'd like to request @LETTER (or something as descriptive) for patterndb parsers to only match letters as that makes this case trivial with patterndb (which I'd ultimately prefer anyhow). Is there a feature request page I should visit to do so? :) Cheers and thanks, Bill
On Oct 18, 2010, at 11:48 AM, Bill Anderson wrote:
Perhaps doing the rewrite then using a patterndb entry? I'll go try that.
Nope. Rewriting host1 to host-1 then calling the patterndb does not work. Reasoning: rewriting the APACHE.ROLEHOST has no effect on $MSG, which is what the patterndb gets. Which in hindsight, I should have known. Cheers, Bill
On Mon, Oct 18, 2010 at 12:25:50PM -0600, Bill Anderson wrote:
On Oct 18, 2010, at 11:48 AM, Bill Anderson wrote:
Perhaps doing the rewrite then using a patterndb entry? I'll go try that.
Nope. Rewriting host1 to host-1 then calling the patterndb does not work. Reasoning: rewriting the APACHE.ROLEHOST has no effect on $MSG, which is what the patterndb gets. Which in hindsight, I should have known.
Hi Bill, I did try to follow your first email but it got complicated and covered some areas of the syslog-ng product I have not used before so I am not sure if you tried this already or not. I was thinking maybe you might be able to help your situation by using APACHE.ROLEHOST in the output file naming template. Once you have added that variable to the message it should stay there despite further parsings with CSV or patterndb unless overwritten. So once you created the APACHE.ROLEHOST variable the first time using CSV parser, you could still probably reference it in your arguments to the file() driver or other output driver template.
Cheers, Bill
Good Luck, Matthew.
On Oct 18, 2010, at 12:32 PM, Matthew Hall wrote:
On Mon, Oct 18, 2010 at 12:25:50PM -0600, Bill Anderson wrote:
On Oct 18, 2010, at 11:48 AM, Bill Anderson wrote:
Perhaps doing the rewrite then using a patterndb entry? I'll go try that.
Nope. Rewriting host1 to host-1 then calling the patterndb does not work. Reasoning: rewriting the APACHE.ROLEHOST has no effect on $MSG, which is what the patterndb gets. Which in hindsight, I should have known.
Hi Bill,
I did try to follow your first email but it got complicated and covered some areas of the syslog-ng product I have not used before so I am not sure if you tried this already or not.
3.x is new to me so much if these areas are likewise new to me. :)
I was thinking maybe you might be able to help your situation by using APACHE.ROLEHOST in the output file naming template. Once you have added that variable to the message it should stay there despite further parsings with CSV or patterndb unless overwritten.
I could, but the goal was to not use it there. Initially it would contain say host1, but in my file naming I want just "host" (a directory). And in that directory would be one access file with host1 and host2 logs written to it.
So once you created the APACHE.ROLEHOST variable the first time using CSV parser, you could still probably reference it in your arguments to the file() driver or other output driver template.
I just found a way. You CAN use the rewrite set to set a new field to a parsed field. To wit: rewrite r_set1{ set("${APACHE.ROLEHOST}", value("RHOST") ); }; This gives me the ability to instead of rewriting APACHE.ROLEHOST, to rewrite RHOST, which of course leaves APACHE.ROLEHOST intact. :D Thus my criteria sans performance testing are met. Now to perf test it. :D Thanks to you, Martin, and Balazs. Cheers, Bill
participants (4)
-
Balazs Scheidler
-
Bill Anderson
-
Martin Holste
-
Matthew Hall