On 02/24/2011 12:07 PM, Valentijn Sessink wrote:
Just a small remark. If you run pdbtool with "patternize" on a log file with logs from an IPv6 host, pdbtool thinks that everything after the first ":" is the log message. For example, the log message:
Feb 24 11:39:26 2a01:4f8:8a0:5141::3c2 named[31090]: lame server resolving ....
... will be patternized as if the logging host were "2a01" and if the pattern should be: <pattern>4f8:8a0:5141::3c2 named[31090]: lame server resolving ....</pattern>
I searched around in the patternize code, but could not find the particular code where the host got cut off. However, for someone a bit at home in this code, I think the fix should be trivial.
As Bazsi has already answered, the patternize code uses syslog-ng's built-in message parsing functionality, much like as the messages were loaded from a file source. It would be possible to add a way to parse to custom message formats, but patternizing is an offline operation anyway and we're only using the message part of the loglines, so I think sed, awk & co. are much better tools for this task :) (note that patternize is capable of loading the input from stdin, you don't even need to duplicate your logs on the disk for this). For it to work, you'd need to be able to tell patternize not to parse the lines in the textfile at all and consider the whole line as the message part. This patch I've just pushed to my repo at git://git.balabit.hu/gyp/syslog-ng-3.2.git does just that: commit 31cedfa84839459046a5b0acd5fb42339e1da807 Author: Peter Gyongyosi <gyp@balabit.hu> Date: Fri Feb 25 11:31:03 2011 +0100 pdbtool patternize: added the --no-parse option This allows for the manual processing of the to-be-patternized log messages instead of requiring it to be in a parsable RFC-compliant log format. After this, you can do things like cat logfile.log | cut -d' ' -f4- | pdbtool patternize --no-parse -f - It's still based on 3.2, but I guess it should apply trivially on 3.3 as well. If not, let me know and I'll open my 3.3 branch and add it there, too. (And if you're not doing it already, you should really try patternize with 3.3, as since a couple of days ago, it contains Balint Kovacs's patch which allows you to specify word delimiters instead of using only the hardcoded space char for this purpose, which can *drastically* improve the quality of your patterns.) greets, Peter