Hi, Just a small remark. If you run pdbtool with "patternize" on a log file with logs from an IPv6 host, pdbtool thinks that everything after the first ":" is the log message. For example, the log message: Feb 24 11:39:26 2a01:4f8:8a0:5141::3c2 named[31090]: lame server resolving .... ... will be patternized as if the logging host were "2a01" and if the pattern should be: <pattern>4f8:8a0:5141::3c2 named[31090]: lame server resolving ....</pattern> I searched around in the patternize code, but could not find the particular code where the host got cut off. However, for someone a bit at home in this code, I think the fix should be trivial. Best regards, Valentijn
On Thu, 2011-02-24 at 12:07 +0100, Valentijn Sessink wrote:
Hi,
Just a small remark. If you run pdbtool with "patternize" on a log file with logs from an IPv6 host, pdbtool thinks that everything after the first ":" is the log message. For example, the log message:
Feb 24 11:39:26 2a01:4f8:8a0:5141::3c2 named[31090]: lame server resolving ....
... will be patternized as if the logging host were "2a01" and if the pattern should be: <pattern>4f8:8a0:5141::3c2 named[31090]: lame server resolving ....</pattern>
I searched around in the patternize code, but could not find the particular code where the host got cut off. However, for someone a bit at home in this code, I think the fix should be trivial.
hmm.. this is not the patternize code that has a problem, but rather the RFC3164 message parsing code, which assumes that ':' is terminating the hostname and marks the beginning of the log message. And this is quite impossible to get out as it'd break a lot of applications out there. wow, I don't know how to solve this properly within the scope of RFC3164 style parsing. RFC5424 should be ok though, but I guess this is written in the logfile now. -- Bazsi
Balazs Scheidler schreef:
hmm.. this is not the patternize code that has a problem, but rather the RFC3164 message parsing code, which assumes that ':' is terminating the hostname and marks the beginning of the log message. And this is quite impossible to get out as it'd break a lot of applications out there.
Can't you change that to ': '? Because that would solve the problem. V.
On 02/24/2011 12:07 PM, Valentijn Sessink wrote:
Just a small remark. If you run pdbtool with "patternize" on a log file with logs from an IPv6 host, pdbtool thinks that everything after the first ":" is the log message. For example, the log message:
Feb 24 11:39:26 2a01:4f8:8a0:5141::3c2 named[31090]: lame server resolving ....
... will be patternized as if the logging host were "2a01" and if the pattern should be: <pattern>4f8:8a0:5141::3c2 named[31090]: lame server resolving ....</pattern>
I searched around in the patternize code, but could not find the particular code where the host got cut off. However, for someone a bit at home in this code, I think the fix should be trivial.
As Bazsi has already answered, the patternize code uses syslog-ng's built-in message parsing functionality, much like as the messages were loaded from a file source. It would be possible to add a way to parse to custom message formats, but patternizing is an offline operation anyway and we're only using the message part of the loglines, so I think sed, awk & co. are much better tools for this task :) (note that patternize is capable of loading the input from stdin, you don't even need to duplicate your logs on the disk for this). For it to work, you'd need to be able to tell patternize not to parse the lines in the textfile at all and consider the whole line as the message part. This patch I've just pushed to my repo at git://git.balabit.hu/gyp/syslog-ng-3.2.git does just that: commit 31cedfa84839459046a5b0acd5fb42339e1da807 Author: Peter Gyongyosi <gyp@balabit.hu> Date: Fri Feb 25 11:31:03 2011 +0100 pdbtool patternize: added the --no-parse option This allows for the manual processing of the to-be-patternized log messages instead of requiring it to be in a parsable RFC-compliant log format. After this, you can do things like cat logfile.log | cut -d' ' -f4- | pdbtool patternize --no-parse -f - It's still based on 3.2, but I guess it should apply trivially on 3.3 as well. If not, let me know and I'll open my 3.3 branch and add it there, too. (And if you're not doing it already, you should really try patternize with 3.3, as since a couple of days ago, it contains Balint Kovacs's patch which allows you to specify word delimiters instead of using only the hardcoded space char for this purpose, which can *drastically* improve the quality of your patterns.) greets, Peter
On Fri, 2011-02-25 at 12:00 +0100, Peter Gyongyosi wrote:
On 02/24/2011 12:07 PM, Valentijn Sessink wrote:
Just a small remark. If you run pdbtool with "patternize" on a log file with logs from an IPv6 host, pdbtool thinks that everything after the first ":" is the log message. For example, the log message:
Feb 24 11:39:26 2a01:4f8:8a0:5141::3c2 named[31090]: lame server resolving ....
... will be patternized as if the logging host were "2a01" and if the pattern should be: <pattern>4f8:8a0:5141::3c2 named[31090]: lame server resolving ....</pattern>
I searched around in the patternize code, but could not find the particular code where the host got cut off. However, for someone a bit at home in this code, I think the fix should be trivial.
As Bazsi has already answered, the patternize code uses syslog-ng's built-in message parsing functionality, much like as the messages were loaded from a file source. It would be possible to add a way to parse to custom message formats, but patternizing is an offline operation anyway and we're only using the message part of the loglines, so I think sed, awk & co. are much better tools for this task :) (note that patternize is capable of loading the input from stdin, you don't even need to duplicate your logs on the disk for this). For it to work, you'd need to be able to tell patternize not to parse the lines in the textfile at all and consider the whole line as the message part.
This patch I've just pushed to my repo at git://git.balabit.hu/gyp/syslog-ng-3.2.git does just that:
commit 31cedfa84839459046a5b0acd5fb42339e1da807 Author: Peter Gyongyosi <gyp@balabit.hu> Date: Fri Feb 25 11:31:03 2011 +0100
pdbtool patternize: added the --no-parse option
This allows for the manual processing of the to-be-patternized log messages instead of requiring it to be in a parsable RFC-compliant log format.
After this, you can do things like
cat logfile.log | cut -d' ' -f4- | pdbtool patternize --no-parse -f -
It's still based on 3.2, but I guess it should apply trivially on 3.3 as well. If not, let me know and I'll open my 3.3 branch and add it there, too. (And if you're not doing it already, you should really try patternize with 3.3, as since a couple of days ago, it contains Balint Kovacs's patch which allows you to specify word delimiters instead of using only the hardcoded space char for this purpose, which can *drastically* improve the quality of your patterns.)
Hi, Can you please paste a Signed-off-by line in an email reply (or perhaps rebase the patch with the signed-off-by line added) please? Thanks. -- Bazsi
On 03/01/2011 07:36 PM, Balazs Scheidler wrote:
This patch I've just pushed to my repo at git://git.balabit.hu/gyp/syslog-ng-3.2.git does just that:
commit 31cedfa84839459046a5b0acd5fb42339e1da807 Author: Peter Gyongyosi<gyp@balabit.hu> Date: Fri Feb 25 11:31:03 2011 +0100
pdbtool patternize: added the --no-parse option
This allows for the manual processing of the to-be-patternized log messages instead of requiring it to be in a parsable RFC-compliant log format.
After this, you can do things like
cat logfile.log | cut -d' ' -f4- | pdbtool patternize --no-parse -f -
It's still based on 3.2, but I guess it should apply trivially on 3.3 as well. If not, let me know and I'll open my 3.3 branch and add it there, too. (And if you're not doing it already, you should really try patternize with 3.3, as since a couple of days ago, it contains Balint Kovacs's patch which allows you to specify word delimiters instead of using only the hardcoded space char for this purpose, which can *drastically* improve the quality of your patterns.)
Can you please paste a Signed-off-by line in an email reply (or perhaps rebase the patch with the signed-off-by line added) please?
Hi, I've created my 3.3 branch at git://git.balabit.hu/gyp/syslog-ng-3.3.git and added the patch there with the Signed-off line: commit 8e2d2608f7a50c52f9a26315cdf639d173c69f15 Author: Peter Gyongyosi<gyp@balabit.hu> Date: Wed Mar 2 10:38:01 2011 +0100 pdbtool patternize: added the --no-parse option This allows for the manual processing of the to-be-patternized log messages instead of requiring it to be in a parsable RFC-compliant log format. Signed-off-by: Peter Gyongyosi<gyp@balabit.hu> greets, Peter
Pulled, thanks Peter. On Wed, 2011-03-02 at 12:38 +0100, Peter Gyongyosi wrote:
On 03/01/2011 07:36 PM, Balazs Scheidler wrote:
This patch I've just pushed to my repo at git://git.balabit.hu/gyp/syslog-ng-3.2.git does just that:
commit 31cedfa84839459046a5b0acd5fb42339e1da807 Author: Peter Gyongyosi<gyp@balabit.hu> Date: Fri Feb 25 11:31:03 2011 +0100
pdbtool patternize: added the --no-parse option
This allows for the manual processing of the to-be-patternized log messages instead of requiring it to be in a parsable RFC-compliant log format.
After this, you can do things like
cat logfile.log | cut -d' ' -f4- | pdbtool patternize --no-parse -f -
It's still based on 3.2, but I guess it should apply trivially on 3.3 as well. If not, let me know and I'll open my 3.3 branch and add it there, too. (And if you're not doing it already, you should really try patternize with 3.3, as since a couple of days ago, it contains Balint Kovacs's patch which allows you to specify word delimiters instead of using only the hardcoded space char for this purpose, which can *drastically* improve the quality of your patterns.)
Can you please paste a Signed-off-by line in an email reply (or perhaps rebase the patch with the signed-off-by line added) please?
Hi,
I've created my 3.3 branch at git://git.balabit.hu/gyp/syslog-ng-3.3.git and added the patch there with the Signed-off line:
commit 8e2d2608f7a50c52f9a26315cdf639d173c69f15 Author: Peter Gyongyosi<gyp@balabit.hu> Date: Wed Mar 2 10:38:01 2011 +0100
pdbtool patternize: added the --no-parse option
This allows for the manual processing of the to-be-patternized log messages instead of requiring it to be in a parsable RFC-compliant log format.
Signed-off-by: Peter Gyongyosi<gyp@balabit.hu>
-- Bazsi
participants (3)
-
Balazs Scheidler
-
Peter Gyongyosi
-
Valentijn Sessink