patterndb - user defined parsers
It would be useful to permit users to define parsers in the patterndb. For example, in our environment, by policy we user a special set and order of characters of our administrators log into hosts and administer them. It would be useful to define a parser of @SYSADMIN@ that would match only our sysadmin accounts. We could then use this parser in the patterndb to take some action such as sending a message to the administrators about the event. Another example would be to create parser for @LOCALIP@ that matches my organizaions IP space. That way a set of rules can be defined using @LOCALIP@ for some kind of alerting, and then any organization could redifine the @LOCALIP@ and use all of the goodness that some third party had created for monitoring logs like an intrusion protection system. Current parsers can be described as QSTRING - match opening char - while not closing char, keep looking ESTRING - while not end string, keep looking NUMBER - while digit keep looking So it seems that general parsers could be constructed with two styles of matching, and then concatenating the together. 1. While in set of characters [some list of characters] 2. While not in set of characters [some list of characters] I would call these INSET to match 1 or more of a set of characters, unless a #-# were specified, then a minimum to a maximum would be required. OUTSET to match 1 or more of anything except the characters, unless a #-# were specified, then a minimum to a maximum would be required. (perhaps a count of + or * could be used to specify 1 or more and 0 or more respectively) and then limit the count of such occurrences so that you could build the @IPv4@ parser as @INSET::123456789*1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@ and @NUMBER@ would be @INSET::123456789:1@@INSET::0123456789@ @FLOAT@ would be @INSET::0123456789.@ Then a user could make <parser name="THOUSAND">@INSET::,:0-1@@INSET::0123456789:3@</parser> <parser name="MONEY">$@INSET::123456789:1-3@@THOUSAND:::*@.@INSET::0123456789:2@ This is kind of like inventing regular expressions :-( I'm not sure how well this fits into the radix tree matching structure, but I wanted to start this discussion. Given the MONEY example, I think it is obvious that there needs to be a way to specify repeating groups of "something" Let the discussion begin!
Have you tried to solve this using the current conditionals available? On Sun, Nov 27, 2011 at 12:27 AM, Evan Rempel <erempel@uvic.ca> wrote:
It would be useful to permit users to define parsers in the patterndb. For example, in our environment, by policy we user a special set and order of characters of our administrators log into hosts and administer them. It would be useful to define a parser of
@SYSADMIN@ that would match only our sysadmin accounts. We could then use this parser in the patterndb to take some action such as sending a message to the administrators about the event.
Another example would be to create parser for @LOCALIP@ that matches my organizaions IP space. That way a set of rules can be defined using @LOCALIP@ for some kind of alerting, and then any organization could redifine the @LOCALIP@ and use all of the goodness that some third party had created for monitoring logs like an intrusion protection system.
Current parsers can be described as
QSTRING - match opening char - while not closing char, keep looking
ESTRING - while not end string, keep looking
NUMBER - while digit keep looking
So it seems that general parsers could be constructed with two styles of matching, and then concatenating the together.
1. While in set of characters [some list of characters] 2. While not in set of characters [some list of characters]
I would call these INSET to match 1 or more of a set of characters, unless a #-# were specified, then a minimum to a maximum would be required. OUTSET to match 1 or more of anything except the characters, unless a #-# were specified, then a minimum to a maximum would be required. (perhaps a count of + or * could be used to specify 1 or more and 0 or more respectively)
and then limit the count of such occurrences so that you could build the @IPv4@ parser as
@INSET::123456789*1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@
and @NUMBER@ would be @INSET::123456789:1@@INSET::0123456789@
@FLOAT@ would be @INSET::0123456789.@
Then a user could make <parser name="THOUSAND">@INSET::,:0-1@@INSET::0123456789:3@</parser> <parser name="MONEY">$@INSET::123456789:1-3@@THOUSAND:::*@.@INSET::0123456789:2@
This is kind of like inventing regular expressions :-(
I'm not sure how well this fits into the radix tree matching structure, but I wanted to start this discussion.
Given the MONEY example, I think it is obvious that there needs to be a way to specify repeating groups of "something"
Let the discussion begin! ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
I'm not sure what you mean by this. I can not find anything on "conditional" for the patterndb. Do you mean that patterndb parses the login ID, and then use the syslog-ng pattern matching to match that login ID to those of our system administrators? ________________________________________ From: syslog-ng-bounces@lists.balabit.hu [syslog-ng-bounces@lists.balabit.hu] On Behalf Of Martin Holste [mcholste@gmail.com] Sent: Sunday, November 27, 2011 8:51 AM Have you tried to solve this using the current conditionals available? On Sun, Nov 27, 2011 at 12:27 AM, Evan Rempel <erempel@uvic.ca> wrote:
It would be useful to permit users to define parsers in the patterndb. For example, in our environment, by policy we user a special set and order of characters of our administrators log into hosts and administer them. It would be useful to define a parser of
@SYSADMIN@ that would match only our sysadmin accounts. We could then use this parser in the patterndb to take some action such as sending a message to the administrators about the event.
Another example would be to create parser for @LOCALIP@ that matches my organizaions IP space. That way a set of rules can be defined using @LOCALIP@ for some kind of alerting, and then any organization could redifine the @LOCALIP@ and use all of the goodness that some third party had created for monitoring logs like an intrusion protection system.
Current parsers can be described as
QSTRING - match opening char - while not closing char, keep looking
ESTRING - while not end string, keep looking
NUMBER - while digit keep looking
So it seems that general parsers could be constructed with two styles of matching, and then concatenating the together.
1. While in set of characters [some list of characters] 2. While not in set of characters [some list of characters]
I would call these INSET to match 1 or more of a set of characters, unless a #-# were specified, then a minimum to a maximum would be required. OUTSET to match 1 or more of anything except the characters, unless a #-# were specified, then a minimum to a maximum would be required. (perhaps a count of + or * could be used to specify 1 or more and 0 or more respectively)
and then limit the count of such occurrences so that you could build the @IPv4@ parser as
@INSET::123456789*1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@
and @NUMBER@ would be @INSET::123456789:1@@INSET::0123456789@
@FLOAT@ would be @INSET::0123456789.@
Then a user could make <parser name="THOUSAND">@INSET::,:0-1@@INSET::0123456789:3@</parser> <parser name="MONEY">$@INSET::123456789:1-3@@THOUSAND:::*@.@INSET::0123456789:2@
This is kind of like inventing regular expressions :-(
I'm not sure how well this fits into the radix tree matching structure, but I wanted to start this discussion.
Given the MONEY example, I think it is obvious that there needs to be a way to specify repeating groups of "something"
Let the discussion begin!
Sorry, see Bazsi's blog post on correlation, conditions, and contexts here: http://bazsi.blogs.balabit.com/2010/10/syslog-ng-correllation-updated/ . On Sun, Nov 27, 2011 at 11:38 AM, Evan Rempel <erempel@uvic.ca> wrote:
I'm not sure what you mean by this. I can not find anything on "conditional" for the patterndb. Do you mean that patterndb parses the login ID, and then use the syslog-ng pattern matching to match that login ID to those of our system administrators?
________________________________________ From: syslog-ng-bounces@lists.balabit.hu [syslog-ng-bounces@lists.balabit.hu] On Behalf Of Martin Holste [mcholste@gmail.com] Sent: Sunday, November 27, 2011 8:51 AM
Have you tried to solve this using the current conditionals available?
On Sun, Nov 27, 2011 at 12:27 AM, Evan Rempel <erempel@uvic.ca> wrote:
It would be useful to permit users to define parsers in the patterndb. For example, in our environment, by policy we user a special set and order of characters of our administrators log into hosts and administer them. It would be useful to define a parser of
@SYSADMIN@ that would match only our sysadmin accounts. We could then use this parser in the patterndb to take some action such as sending a message to the administrators about the event.
Another example would be to create parser for @LOCALIP@ that matches my organizaions IP space. That way a set of rules can be defined using @LOCALIP@ for some kind of alerting, and then any organization could redifine the @LOCALIP@ and use all of the goodness that some third party had created for monitoring logs like an intrusion protection system.
Current parsers can be described as
QSTRING - match opening char - while not closing char, keep looking
ESTRING - while not end string, keep looking
NUMBER - while digit keep looking
So it seems that general parsers could be constructed with two styles of matching, and then concatenating the together.
1. While in set of characters [some list of characters] 2. While not in set of characters [some list of characters]
I would call these INSET to match 1 or more of a set of characters, unless a #-# were specified, then a minimum to a maximum would be required. OUTSET to match 1 or more of anything except the characters, unless a #-# were specified, then a minimum to a maximum would be required. (perhaps a count of + or * could be used to specify 1 or more and 0 or more respectively)
and then limit the count of such occurrences so that you could build the @IPv4@ parser as
@INSET::123456789*1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@
and @NUMBER@ would be @INSET::123456789:1@@INSET::0123456789@
@FLOAT@ would be @INSET::0123456789.@
Then a user could make <parser name="THOUSAND">@INSET::,:0-1@@INSET::0123456789:3@</parser> <parser name="MONEY">$@INSET::123456789:1-3@@THOUSAND:::*@.@INSET::0123456789:2@
This is kind of like inventing regular expressions :-(
I'm not sure how well this fits into the radix tree matching structure, but I wanted to start this discussion.
Given the MONEY example, I think it is obvious that there needs to be a way to specify repeating groups of "something"
Let the discussion begin!
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Yes, I am aware of this, and it is good for generating the alerts when an event occurs, and I am planning on using this powerful feature as well, but it does not address my problem. The problem I am having is that I need to match on any login event where the login ID is one of 22 known administrator accounts. I don't want to have to augment the pattern for EVERY login type message to include the 22 different patterns. What I would like to do is may my own parser that would have the 22 known login IDs and then use that parser in the rest of the patterndb. This type of structure has already been applied to the syslog-ng configuration by adding the BLOCK { }; construct. Also, when a sysadmin leaves our group, or we hire new staff, I can just update my own parser for the new/removed sysadmin login ID and my infrastructure continues to work well. Evan ________________________________________ From: syslog-ng-bounces@lists.balabit.hu [syslog-ng-bounces@lists.balabit.hu] On Behalf Of Martin Holste [mcholste@gmail.com] Sent: Sunday, November 27, 2011 9:47 AM To: Syslog-ng users' and developers' mailing list Subject: Re: [syslog-ng] patterndb - user defined parsers Sorry, see Bazsi's blog post on correlation, conditions, and contexts here: http://bazsi.blogs.balabit.com/2010/10/syslog-ng-correllation-updated/ . On Sun, Nov 27, 2011 at 11:38 AM, Evan Rempel <erempel@uvic.ca> wrote:
I'm not sure what you mean by this. I can not find anything on "conditional" for the patterndb. Do you mean that patterndb parses the login ID, and then use the syslog-ng pattern matching to match that login ID to those of our system administrators?
________________________________________ From: syslog-ng-bounces@lists.balabit.hu [syslog-ng-bounces@lists.balabit.hu] On Behalf Of Martin Holste [mcholste@gmail.com] Sent: Sunday, November 27, 2011 8:51 AM
Have you tried to solve this using the current conditionals available?
On Sun, Nov 27, 2011 at 12:27 AM, Evan Rempel <erempel@uvic.ca> wrote:
It would be useful to permit users to define parsers in the patterndb. For example, in our environment, by policy we user a special set and order of characters of our administrators log into hosts and administer them. It would be useful to define a parser of
@SYSADMIN@ that would match only our sysadmin accounts. We could then use this parser in the patterndb to take some action such as sending a message to the administrators about the event.
Another example would be to create parser for @LOCALIP@ that matches my organizaions IP space. That way a set of rules can be defined using @LOCALIP@ for some kind of alerting, and then any organization could redifine the @LOCALIP@ and use all of the goodness that some third party had created for monitoring logs like an intrusion protection system.
Current parsers can be described as
QSTRING - match opening char - while not closing char, keep looking
ESTRING - while not end string, keep looking
NUMBER - while digit keep looking
So it seems that general parsers could be constructed with two styles of matching, and then concatenating the together.
1. While in set of characters [some list of characters] 2. While not in set of characters [some list of characters]
I would call these INSET to match 1 or more of a set of characters, unless a #-# were specified, then a minimum to a maximum would be required. OUTSET to match 1 or more of anything except the characters, unless a #-# were specified, then a minimum to a maximum would be required. (perhaps a count of + or * could be used to specify 1 or more and 0 or more respectively)
and then limit the count of such occurrences so that you could build the @IPv4@ parser as
@INSET::123456789*1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@
and @NUMBER@ would be @INSET::123456789:1@@INSET::0123456789@
@FLOAT@ would be @INSET::0123456789.@
Then a user could make <parser name="THOUSAND">@INSET::,:0-1@@INSET::0123456789:3@</parser> <parser name="MONEY">$@INSET::123456789:1-3@@THOUSAND:::*@.@INSET::0123456789:2@
This is kind of like inventing regular expressions :-(
I'm not sure how well this fits into the radix tree matching structure, but I wanted to start this discussion.
Given the MONEY example, I think it is obvious that there needs to be a way to specify repeating groups of "something"
Let the discussion begin!
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Ok, got it. Yes, administration would be tough. We currently handle this with alerts in ELSA by parsing out the field we want to check and then matching against it with a bunch of OR'd values like this: +WINDOWS.eventid:538 username1 username2 username3 On Sun, Nov 27, 2011 at 11:58 AM, Evan Rempel <erempel@uvic.ca> wrote:
Yes, I am aware of this, and it is good for generating the alerts when an event occurs, and I am planning on using this powerful feature as well, but it does not address my problem.
The problem I am having is that I need to match on any login event where the login ID is one of 22 known administrator accounts. I don't want to have to augment the pattern for EVERY login type message to include the 22 different patterns. What I would like to do is may my own parser that would have the 22 known login IDs and then use that parser in the rest of the patterndb.
This type of structure has already been applied to the syslog-ng configuration by adding the BLOCK { }; construct.
Also, when a sysadmin leaves our group, or we hire new staff, I can just update my own parser for the new/removed sysadmin login ID and my infrastructure continues to work well.
Evan ________________________________________ From: syslog-ng-bounces@lists.balabit.hu [syslog-ng-bounces@lists.balabit.hu] On Behalf Of Martin Holste [mcholste@gmail.com] Sent: Sunday, November 27, 2011 9:47 AM To: Syslog-ng users' and developers' mailing list Subject: Re: [syslog-ng] patterndb - user defined parsers
Sorry, see Bazsi's blog post on correlation, conditions, and contexts here: http://bazsi.blogs.balabit.com/2010/10/syslog-ng-correllation-updated/ .
On Sun, Nov 27, 2011 at 11:38 AM, Evan Rempel <erempel@uvic.ca> wrote:
I'm not sure what you mean by this. I can not find anything on "conditional" for the patterndb. Do you mean that patterndb parses the login ID, and then use the syslog-ng pattern matching to match that login ID to those of our system administrators?
________________________________________ From: syslog-ng-bounces@lists.balabit.hu [syslog-ng-bounces@lists.balabit.hu] On Behalf Of Martin Holste [mcholste@gmail.com] Sent: Sunday, November 27, 2011 8:51 AM
Have you tried to solve this using the current conditionals available?
On Sun, Nov 27, 2011 at 12:27 AM, Evan Rempel <erempel@uvic.ca> wrote:
It would be useful to permit users to define parsers in the patterndb. For example, in our environment, by policy we user a special set and order of characters of our administrators log into hosts and administer them. It would be useful to define a parser of
@SYSADMIN@ that would match only our sysadmin accounts. We could then use this parser in the patterndb to take some action such as sending a message to the administrators about the event.
Another example would be to create parser for @LOCALIP@ that matches my organizaions IP space. That way a set of rules can be defined using @LOCALIP@ for some kind of alerting, and then any organization could redifine the @LOCALIP@ and use all of the goodness that some third party had created for monitoring logs like an intrusion protection system.
Current parsers can be described as
QSTRING - match opening char - while not closing char, keep looking
ESTRING - while not end string, keep looking
NUMBER - while digit keep looking
So it seems that general parsers could be constructed with two styles of matching, and then concatenating the together.
1. While in set of characters [some list of characters] 2. While not in set of characters [some list of characters]
I would call these INSET to match 1 or more of a set of characters, unless a #-# were specified, then a minimum to a maximum would be required. OUTSET to match 1 or more of anything except the characters, unless a #-# were specified, then a minimum to a maximum would be required. (perhaps a count of + or * could be used to specify 1 or more and 0 or more respectively)
and then limit the count of such occurrences so that you could build the @IPv4@ parser as
@INSET::123456789*1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@
and @NUMBER@ would be @INSET::123456789:1@@INSET::0123456789@
@FLOAT@ would be @INSET::0123456789.@
Then a user could make <parser name="THOUSAND">@INSET::,:0-1@@INSET::0123456789:3@</parser> <parser name="MONEY">$@INSET::123456789:1-3@@THOUSAND:::*@.@INSET::0123456789:2@
This is kind of like inventing regular expressions :-(
I'm not sure how well this fits into the radix tree matching structure, but I wanted to start this discussion.
Given the MONEY example, I think it is obvious that there needs to be a way to specify repeating groups of "something"
Let the discussion begin!
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
On Sun, 2011-11-27 at 09:58 -0800, Evan Rempel wrote:
Yes, I am aware of this, and it is good for generating the alerts when an event occurs, and I am planning on using this powerful feature as well, but it does not address my problem.
The problem I am having is that I need to match on any login event where the login ID is one of 22 known administrator accounts. I don't want to have to augment the pattern for EVERY login type message to include the 22 different patterns. What I would like to do is may my own parser that would have the 22 known login IDs and then use that parser in the rest of the patterndb.
This type of structure has already been applied to the syslog-ng configuration by adding the BLOCK { }; construct.
Also, when a sysadmin leaves our group, or we hire new staff, I can just update my own parser for the new/removed sysadmin login ID and my infrastructure continues to work well.
This kind of stuff shouldn't be done by patterns, their primary role is to extract information from the log message. If you want to create an alert based on user-name it should be performed using the filtering engine. log { source(s_all); parser(p_dbparser); log { filter(f_alerts); filter(f_admins); destination(d_adminalerts); }; log { destination(d_normal); }; }; -- Bazsi
On Sunday, November 27, 2011 07:27 CET, Evan Rempel <erempel@uvic.ca> wrote:
It would be useful to permit users to define parsers in the patterndb. For example, in our environment, by policy we user a special set and order of characters of our administrators log into hosts and administer them. It would be useful to define a parser of
@SYSADMIN@ that would match only our sysadmin accounts. We could then use this parser in the patterndb to take some action such as sending a message to the administrators about the event.
Another example would be to create parser for @LOCALIP@ that matches my organizaions IP space. That way a set of rules can be defined using @LOCALIP@ for some kind of alerting, and then any organization could redifine the @LOCALIP@ and use all of the goodness that some third party had created for monitoring logs like an intrusion protection system.
Hi, it might not be exactly what you are after, but it is possible to use filters and template functions on parsed fields. Would it be possible to parse the IP with an IP parser, and then later in the logpath check its value with a regex/network filter? Or with a well-placed if template function? Robert
Current parsers can be described as
QSTRING - match opening char - while not closing char, keep looking
ESTRING - while not end string, keep looking
NUMBER - while digit keep looking
So it seems that general parsers could be constructed with two styles of matching, and then concatenating the together.
1. While in set of characters [some list of characters] 2. While not in set of characters [some list of characters]
I would call these INSET to match 1 or more of a set of characters, unless a #-# were specified, then a minimum to a maximum would be required. OUTSET to match 1 or more of anything except the characters, unless a #-# were specified, then a minimum to a maximum would be required. (perhaps a count of + or * could be used to specify 1 or more and 0 or more respectively)
and then limit the count of such occurrences so that you could build the @IPv4@ parser as
@INSET::123456789*1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@
and @NUMBER@ would be @INSET::123456789:1@@INSET::0123456789@
@FLOAT@ would be @INSET::0123456789.@
Then a user could make <parser name="THOUSAND">@INSET::,:0-1@@INSET::0123456789:3@</parser> <parser name="MONEY">$@INSET::123456789:1-3@@THOUSAND:::*@.@INSET::0123456789:2@
This is kind of like inventing regular expressions :-(
I'm not sure how well this fits into the radix tree matching structure, but I wanted to start this discussion.
Given the MONEY example, I think it is obvious that there needs to be a way to specify repeating groups of "something"
Let the discussion begin! ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
On Sat, 2011-11-26 at 22:27 -0800, Evan Rempel wrote:
It would be useful to permit users to define parsers in the patterndb. For example, in our environment, by policy we user a special set and order of characters of our administrators log into hosts and administer them. It would be useful to define a parser of
@SYSADMIN@ that would match only our sysadmin accounts. We could then use this parser in the patterndb to take some action such as sending a message to the administrators about the event.
Another example would be to create parser for @LOCALIP@ that matches my organizaions IP space. That way a set of rules can be defined using @LOCALIP@ for some kind of alerting, and then any organization could redifine the @LOCALIP@ and use all of the goodness that some third party had created for monitoring logs like an intrusion protection system.
Current parsers can be described as
QSTRING - match opening char - while not closing char, keep looking
ESTRING - while not end string, keep looking
NUMBER - while digit keep looking
So it seems that general parsers could be constructed with two styles of matching, and then concatenating the together.
1. While in set of characters [some list of characters] 2. While not in set of characters [some list of characters]
I would call these INSET to match 1 or more of a set of characters, unless a #-# were specified, then a minimum to a maximum would be required. OUTSET to match 1 or more of anything except the characters, unless a #-# were specified, then a minimum to a maximum would be required. (perhaps a count of + or * could be used to specify 1 or more and 0 or more respectively)
and then limit the count of such occurrences so that you could build the @IPv4@ parser as
@INSET::123456789*1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@.@INSET::123456789:1@@INSET::0123456789:0-2@
and @NUMBER@ would be @INSET::123456789:1@@INSET::0123456789@
@FLOAT@ would be @INSET::0123456789.@
Then a user could make <parser name="THOUSAND">@INSET::,:0-1@@INSET::0123456789:3@</parser> <parser name="MONEY">$@INSET::123456789:1-3@@THOUSAND:::*@.@INSET::0123456789:2@
This is kind of like inventing regular expressions :-(
I'm not sure how well this fits into the radix tree matching structure, but I wanted to start this discussion.
Given the MONEY example, I think it is obvious that there needs to be a way to specify repeating groups of "something"
Some kind of parser definition would make perfect sense. There are some technical problems to be resolved first though. Right now, conflicts between rules are not resolved very well. If two rules conflict on a parser (e.g. their prefix is the same and then two different parsers are used at the same location), then db-parser() evaluates them in order, and the first one wins. Then if an upcoming parser doesn't match, no backtracking is done. This should be resolved before adding a lot of different and perhaps user defined parsers. Also, instead of reinventing the wheel, I'd simply add a @REGEXP@ parser, which if hit could of course become a petformance bottleneck, but stuffing all arguments into a @@ expression is difficult to read and maintain. -- Bazsi
Also, instead of reinventing the wheel, I'd simply add a @REGEXP@ parser, which if hit could of course become a petformance bottleneck, but stuffing all arguments into a @@ expression is difficult to read and maintain.
One idea you can borrow from IDS with regard to regexp: Have regexp be evaluated after non-regexp matches evaluate so that they are not invoked on every message, but instead are used for clarification. For instance, I am having occasional difficulties with competing patterns due to the almost CSV-like quality in the message patterns. Specifically, messages sent by eventlog-to-syslog follow a pattern of eventid: source: message and so my pattern of @NUMBER:eventid:@: @ESTRING:source:@ @ANYSTRING@ basically matches anything with two colons in the message with a leading number. The program name changes, so you can't pre-filter with that. If you could change @ANYSTRING@ to @REGEXP@ that would match and extract various parts of the message, but would only be evaluated after NUMBER and ESTRING hit, you could have good performance and easy-to-write patterns because you could invoke the power of (often already available) regexp's. Another idea would be to have sub-patterns. These would take place after a first pass of patterns are evaluated. So in the above example, after NUMBER and ESTRING extract their variables, the remaining message block would be passed to another pattern set for further evaluation, but with the parts NUMBER and ESTRING matched removed. The variables the prior patterns stored would still be available. This would allow sub-patterns to behave like programming subclasses and written without having to copy the prior field matches each time. Instead, that work would already have been completed and they would instead "inherit" all of the prior match field extractions. If you combine the two ideas, you could have initial matching with field extractions followed by sub-patterns which contain regexp's to do the fine-grained field extractions. This would present a more versatile and programmatic way of matching precedence.
participants (4)
-
Balazs Scheidler
-
Evan Rempel
-
Fekete Róbert
-
Martin Holste