pattern usage and optimization
Hi, I want to parse my sshd logs to store information in a remote database. I already did it using logstash. But I just discovered syslog-ng can do such things using patterndb. I could manage to setup a few <pattern> but I have difficulties building generic rules. I end up with 4 or 5 rules where I can only deal with one or two using logstash. So I expect to be missing something with patterns :) Here's a log example: Disconnected from user joe 192.168.0.5 port 50121 Disconnected from invalid user www 192.168.0.7 port 6794 [preauth] Disconnected from authenticating user root 192.168.0.3 port 52591 [preauth] So I wrote those three patterns: <!-- Disconnected from user joe 192.168.0.5 port 50121 --> <pattern>@ESTRING:EVENT: from @user @ESTRING:USERNAME: @@ESTRING:IP: @port @NUMBER:PORT:@</pattern> <!-- Disconnected from invalid user www 192.168.0.7 port 6794 [preauth] --> <pattern>@ESTRING:EVENT: from @invalid user @ESTRING:USERNAME: @@ESTRING:IP: @port @NUMBER:PORT:@@ANYSTRING:EXTRA:@</pattern> <!-- Disconnected from authenticating user root 192.168.0.3 port 52591 [preauth] --> <pattern>@ESTRING:EVENT: from @authenticating user @ESTRING:USERNAME: @@ESTRING:IP: @port @NUMBER:PORT:@@ANYSTRING:EXTRA:@</pattern> To me, those 3 lines can be described using a single expression this way : ("Disconnected from") ("user"|"invalid user"|"authenticating user") (username) (ip_host) port (ip_port)(empty|extra_stuff) Basically, the features I couldn't find are : - "match a defined string and affect to variable" - "match a string or another and affect to variable" - "match a string or EOL and affect to variable if not empty". Is it possible to have a single pattern that would lead to have: - EVENT = "Disconnected from" - METHOD = "user" | "invalid user" | "authenticating user" - USERNAME = <parsed username> - IP = <parsed ip address> - PORT = <parsed port number> - EXTRA = <empty> | <parsed extra information> Thanks for you help.
Hi Joel, The inner workings of patterndb and grok are very different, so you can't really use them the same way. One of the consequences is as you've already discovered that you sometimes need two instead of one pattern. This might seem a limitation when moving from another tool, but is has reasons and one of the advantages you'll see over time with patterndb are its speed: it's really fast. Also you get unit tests (example messages) and you can embed any template function into the rules for instance to munge or enrich the data. Here are a few rules that apply to your example: 1. Don't use patterns at the start, as these will mess up the radix tree:
@ESTRING:EVENT: from @user @ESTRING:USERNAME: @@ESTRING:IP: @port
Use literals instead: | Disconnected from user @ESTRING:USERNAME: @@ESTRING:IP: @port 2. There is no regexp like grouping, so you can't say A or B or C. There *is* the @PCRE@ parser, but it doesn't allow to extract the matched value You've got two options here: a. Use multiple patterns: | Disconnected from user @ESTRING:USERNAME: @ | Disconnected from invalid user @ESTRING:USERNAME: @ | Disconnected from authenticating user @ESTRING:USERNAME: @ b. Use one pattern and do some string stitching: | <patterns> | <pattern>Disconnected from @ESTRING:METHOD:user @@ESTRING:USER: @@ESTRING:IP: @port @NUMBER:PORT@</pattern> | </patterns> | <values> | <value name='METHOD'>$(strip "${METHOD}")</value> | </values> The 'strip' is necessary as the pattern will catch the extra space. Admittedly method b. is probably less readable, but if you care about deduplication you might favour it over b. 3. There is unfortunately no optional parser, so if you want to match two identical messages except for the ending, you need to use two patterns if you want to extract EXTRA Cheers
Thank you very much for this detailed explanation. This makes it very clear now. I'll write my patterns the "syslog-ng way" :) Le 01/10/2018 10:20, Fabien Wernli a écrit :
Hi Joel,
The inner workings of patterndb and grok are very different, so you can't really use them the same way. One of the consequences is as you've already discovered that you sometimes need two instead of one pattern.
This might seem a limitation when moving from another tool, but is has reasons and one of the advantages you'll see over time with patterndb are its speed: it's really fast. Also you get unit tests (example messages) and you can embed any template function into the rules for instance to munge or enrich the data.
Here are a few rules that apply to your example:
1. Don't use patterns at the start, as these will mess up the radix tree:
@ESTRING:EVENT: from @user @ESTRING:USERNAME: @@ESTRING:IP: @port
Use literals instead:
| Disconnected from user @ESTRING:USERNAME: @@ESTRING:IP: @port
2. There is no regexp like grouping, so you can't say A or B or C. There *is* the @PCRE@ parser, but it doesn't allow to extract the matched value You've got two options here:
a. Use multiple patterns:
| Disconnected from user @ESTRING:USERNAME: @ | Disconnected from invalid user @ESTRING:USERNAME: @ | Disconnected from authenticating user @ESTRING:USERNAME: @
b. Use one pattern and do some string stitching:
| <patterns> | <pattern>Disconnected from @ESTRING:METHOD:user @@ESTRING:USER: @@ESTRING:IP: @port @NUMBER:PORT@</pattern> | </patterns> | <values> | <value name='METHOD'>$(strip "${METHOD}")</value> | </values>
The 'strip' is necessary as the pattern will catch the extra space. Admittedly method b. is probably less readable, but if you care about deduplication you might favour it over b.
3. There is unfortunately no optional parser, so if you want to match two identical messages except for the ending, you need to use two patterns if you want to extract EXTRA
Cheers
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
On Mon, Oct 01, 2018 at 10:50:02AM +0200, Joel Carnat wrote:
Thank you very much for this detailed explanation. This makes it very clear now. I'll write my patterns the "syslog-ng way" :)
you're welcome! If you're in a hurry, there *is* a grok parser in the syslog-ng incubator…
participants (2)
-
Fabien Wernli
-
Joel Carnat