PatternDB: macros extraction from URLs
Hello all, We are using OSE 3.2.1 version and till now we have managed to configure most of the patterns we need. However we have reached to a point where we need some hints from the users of this list. The problem is the following: how to extract macros when the order of them are not known (like in a URL). For example we would like to extract 'user' and 'action' from an URL like the one mentioned below: APP[9988]: WEB[0011]: http://abc.example.com/query.php?user=test1&action=login&host=prod1&device=d... HTTP 1.1 Unfortunately the "user" and "action" could be placed anywhere in the URL (as the URL is not created by aour application) thus we have to create something like this: <pattern>http://abc.example.com/query.php@ESTRING::u@ser=@ESTRING:user:&@action=@ESTRING:action:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::u@ser=@ESTRING:user:&@ESTRING::a@ction=@ESTRING:action:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::a@ction=@ESTRING:action:&@user=@ESTRING:user:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::a@ction=@ESTRING:action:&@ESTRING::u@ser=@ESTRING:user:&@</pattern> and so on.... Not to mention if we need to extract the 'device' macro as well - the number of patterns grow significantly. Have somebody some hints on how to optimize the extraction of macros when them are not in an known order? Thank you in advance, Ioan
Ioan Indreias schreef:
For example we would like to extract 'user' and 'action' from an URL like the one mentioned below:
APP[9988]: WEB[0011]: http://abc.example.com/query.php?user=test1&action=login&host=prod1&device=d... HTTP 1.1 [...] Have somebody some hints on how to optimize the extraction of macros when them are not in an known order?
It's probably the http variables you're after, not the URL's. For example, how about: http://abc.example.com/query.php?us%65r=test1&%61ction=login&host=prod1&devi... Or even: http://abc.example.com/query.php?us%65r=%61dmin&%61ction=login&host=prod1&de... (where syslog-ng would report "user=guest" doing "nothing" on any "user=/action=" matching pattern, while the web app is happily logging you in as admin) My first thought is: don't do this in syslog-ng, because it won't tell you the things you want to know. (I could be wrong, as I don't know what your web app is about and what you're trying to extract and why). Best regards, Valentijn
Hi, I have an idea how this could be accomplished, but it is strictly speculative and untested, and also requires OSE 3.3, because you'll have to use mongodb. First, you'll have to separate your variables into generic macros using parsers, since you do not know their order. For example, from user=test1&action=login&host=prod1&device=device1 you could make $myvariable1 and $myvalue1, $myvariable2 and $myvalue2, and so on, where the value of $myvariable1 is user, the value of $myvalue1 is test1, etc. Next, you put your messages to mongodb. The trick here is that you do not explicitly know the name of the fields you have for each message, but it doesn't matter because mongodb will handle that (i _think_ so, but Algernon will probably correct me if it doesn't). So when setting the field names for the mongodb driver, you'll use macros, something like: fields("$myvariable1", "$myvariable2"); values("$myvalue1", "$myvalue2"); (or whatever the syntax of the driver is) Does this sound reasonable? Robert On 02/17/2011 09:31 AM, Ioan Indreias wrote:
Hello all,
We are using OSE 3.2.1 version and till now we have managed to configure most of the patterns we need.
However we have reached to a point where we need some hints from the users of this list. The problem is the following: how to extract macros when the order of them are not known (like in a URL).
For example we would like to extract 'user' and 'action' from an URL like the one mentioned below:
APP[9988]: WEB[0011]: http://abc.example.com/query.php?user=test1&action=login&host=prod1&device=d... HTTP 1.1
Unfortunately the "user" and "action" could be placed anywhere in the URL (as the URL is not created by aour application) thus we have to create something like this:
<pattern>http://abc.example.com/query.php@ESTRING::u@ser=@ESTRING:user:&@action=@ESTRING:action:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::u@ser=@ESTRING:user:&@ESTRING::a@ction=@ESTRING:action:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::a@ction=@ESTRING:action:&@user=@ESTRING:user:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::a@ction=@ESTRING:action:&@ESTRING::u@ser=@ESTRING:user:&@</pattern> and so on....
Not to mention if we need to extract the 'device' macro as well - the number of patterns grow significantly.
Have somebody some hints on how to optimize the extraction of macros when them are not in an known order?
Thank you in advance, Ioan ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Thu, 2011-02-17 at 14:11 +0100, Fekete Robert wrote:
I have an idea how this could be accomplished, but it is strictly speculative and untested, and also requires OSE 3.3, because you'll have to use mongodb.
The method described by Robert _might_ work with 3.2 too, with other destinations, but your mileage may vary, and it has a few shortcomings.
First, you'll have to separate your variables into generic macros using parsers, since you do not know their order. For example, from user=test1&action=login&host=prod1&device=device1 you could make $myvariable1 and $myvalue1, $myvariable2 and $myvalue2, and so on, where the value of $myvariable1 is user, the value of $myvalue1 is test1, etc.
As far as I understand, this'd mean that instead of having a pattern for each and every combination of key=value pairs in the URLs, you'd only have N patterns, where N is the maximum number of parameters you want to support. Not the best, but better than what was explained in the original mail. In each pattern, you'd extract not "user=$value", but "$key1=$value1". This has the downside that you wouldn't be able to explicitly address the 'user' field, but if you only want to extract the field and log it (and not filter on it), then that shouldn't be a problem: one can use $key1=$value1 in templates. Your patterns would look something like this: <pattern>http://abc.example.com/query.php@ESTRING:key1:=@@ESTRING:value1:&</pattern> And so on... My patterndb knowledge is sadly lacking, but hopefully its understandable what I'm trying to say :)
Next, you put your messages to mongodb. The trick here is that you do not explicitly know the name of the fields you have for each message, but it doesn't matter because mongodb will handle that (i _think_ so, but Algernon will probably correct me if it doesn't). So when setting the field names for the mongodb driver, you'll use macros, something like:
fields("$myvariable1", "$myvariable2"); values("$myvalue1", "$myvalue2");
At the moment, that won't work, as the field names cannot be templates in the mongodb driver yet (I thought it could be, but now that I looked at the code, I was wrong). However, 3.3 will have a solution for that, in the form of value-pairs(). I won't go into detail about that just yet, since the code is not ready yet. For the time being, if you're ok with logging the parameters, and you don't need to filter on them, then there's a possible solution, even for 3.2, for any destination, based on the suggestions Robert gave. -- |8]
We are using OSE 3.2.1 version and till now we have managed to configure most of the patterns we need.
However we have reached to a point where we need some hints from the users of this list. The problem is the following: how to extract macros when the order of them are not known (like in a URL).
For example we would like to extract 'user' and 'action' from an URL like the one mentioned below:
APP[9988]: WEB[0011]: http://abc.example.com/query.php?user=test1&action=login&host=prod1&device=d... HTTP 1.1
Unfortunately the "user" and "action" could be placed anywhere in the URL (as the URL is not created by aour application) thus we have to create something like this:
<pattern>http://abc.example.com/query.php@ESTRING::u@ser=@ESTRING:user:&@action=@ESTRING:action:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::u@ser=@ESTRING:user:&@ESTRING::a@ction=@ESTRING:action:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::a@ction=@ESTRING:action:&@user=@ESTRING:user:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::a@ction=@ESTRING:action:&@ESTRING::u@ser=@ESTRING:user:&@</pattern> and so on....
While replying to Robert, I had another idea... I'm not sure how feasible it would be, since I've never done anything similar, and my knowledge in this area is almost nonexistent. But! URL parameters could be extracted as a single string first, and fed to a CSV-parser, that'd do the job of extracting the fields. You'd need to combine patterndb with the CSV parser, though, and I'm not quite sure how one would go about doing that, or if it's even possible. But it's worth a shot. If that fails, another option would be to pre-process the logs, via a short - say - perl script, that parses the URLs and rearranges the parameters into a specific order, and adds empty values for any missing parameters, so you'll only need a single pattern later. Then first send the logs to a program() destination where the script does its stuff, and then deliver that output back into syslog-ng, at which point it can be easily processed with patterndb. -- |8]
On Thu, Feb 17, 2011 at 02:39:27PM +0100, Gergely Nagy wrote:
URL parameters could be extracted as a single string first, and fed to a CSV-parser, that'd do the job of extracting the fields. You'd need to combine patterndb with the CSV parser, though, and I'm not quite sure how one would go about doing that, or if it's even possible.
It's possible. Bazsi often recommends it to those of us with more complex messages to parse. ;-) Matthew.
Hello all, Thanks a lot for the time and hints provided. We will try to figure for each proposed solution how to implement it and come back with further questions or with the final implementation. Best regards, Ioan On Thu, Feb 17, 2011 at 7:23 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
On Thu, Feb 17, 2011 at 02:39:27PM +0100, Gergely Nagy wrote:
URL parameters could be extracted as a single string first, and fed to a CSV-parser, that'd do the job of extracting the fields. You'd need to combine patterndb with the CSV parser, though, and I'm not quite sure how one would go about doing that, or if it's even possible.
It's possible. Bazsi often recommends it to those of us with more complex messages to parse. ;-)
Matthew. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
this should be either parsed with csv-parser() or a separate url-parser() _after_ the URL has been identified by patterndb. On Thu, 2011-02-17 at 10:31 +0200, Ioan Indreias wrote:
Hello all,
We are using OSE 3.2.1 version and till now we have managed to configure most of the patterns we need.
However we have reached to a point where we need some hints from the users of this list. The problem is the following: how to extract macros when the order of them are not known (like in a URL).
For example we would like to extract 'user' and 'action' from an URL like the one mentioned below:
APP[9988]: WEB[0011]: http://abc.example.com/query.php?user=test1&action=login&host=prod1&device=d... HTTP 1.1
Unfortunately the "user" and "action" could be placed anywhere in the URL (as the URL is not created by aour application) thus we have to create something like this:
<pattern>http://abc.example.com/query.php@ESTRING::u@ser=@ESTRING:user:&@action=@ESTRING:action:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::u@ser=@ESTRING:user:&@ESTRING::a@ction=@ESTRING:action:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::a@ction=@ESTRING:action:&@user=@ESTRING:user:&@</pattern> <pattern>http://abc.example.com/query.php@ESTRING::a@ction=@ESTRING:action:&@ESTRING::u@ser=@ESTRING:user:&@</pattern> and so on....
Not to mention if we need to extract the 'device' macro as well - the number of patterns grow significantly.
Have somebody some hints on how to optimize the extraction of macros when them are not in an known order?
-- Bazsi
participants (6)
-
Balazs Scheidler
-
Fekete Robert
-
Gergely Nagy
-
Ioan Indreias
-
Matthew Hall
-
Valentijn Sessink