Can TAG field be terminated by space?
Hi, I'm using syslog-ng-2.0.5 and trying to store syslog messages into a DB table with a TAG field column. But when TAG fileld is terminated with space character(' ') in syslog message, syslog-ng parser returns as TAG field not only TAG field but also some extra strings end with a colon(':') in CONTENT field. For example, if MSG part is "program abc: message...", syslog-ng returns "program abc" as a TAG field. According to the section 4.1.3 of RFC3164, a space character can also terminate a TAG field. The TAG is a string of ABNF alphanumeric characters that MUST NOT exceed 32 characters. Any non-alphanumeric character will terminate the TAG field and will be assumed to be the starting character of the CONTENT field. Most commonly, the first character of the CONTENT field that signifies the conclusion of the TAG field has been seen to be the left square bracket character ("["), a colon character (":"), or a space character. Is it possible to change this behavior of syslog-ng? Thanks in advance. -- Tsurusawa Takeshi <tsuru@grid.nii.ac.jp>
On Mon, 2007-10-22 at 18:26 +0900, Tsurusawa Takeshi wrote:
Hi,
I'm using syslog-ng-2.0.5 and trying to store syslog messages into a DB table with a TAG field column.
But when TAG fileld is terminated with space character(' ') in syslog message, syslog-ng parser returns as TAG field not only TAG field but also some extra strings end with a colon(':') in CONTENT field.
For example, if MSG part is "program abc: message...", syslog-ng returns "program abc" as a TAG field.
According to the section 4.1.3 of RFC3164, a space character can also terminate a TAG field.
The TAG is a string of ABNF alphanumeric characters that MUST NOT exceed 32 characters. Any non-alphanumeric character will terminate the TAG field and will be assumed to be the starting character of the CONTENT field. Most commonly, the first character of the CONTENT field that signifies the conclusion of the TAG field has been seen to be the left square bracket character ("["), a colon character (":"), or a space character.
Is it possible to change this behavior of syslog-ng?
I'm usually reluctant to make such changes as it is very easy to create regressions when changing the log parsing code. However there were two similar cases in the message parsing code that used different TAG terminator characters, thus I unified the two. (one was using space as separator). Here's the patch: http://git.balabit.hu/?p=bazsi/syslog-ng-2.0.git;a=commit;h=4a84d904fe0fc5b3... tomorrow's snapshot should also contain the change. -- Bazsi
On Tue, 2007-10-23 at 18:50 +0200, Balazs Scheidler wrote:
On Mon, 2007-10-22 at 18:26 +0900, Tsurusawa Takeshi wrote:
Hi,
I'm using syslog-ng-2.0.5 and trying to store syslog messages into a DB table with a TAG field column.
But when TAG fileld is terminated with space character(' ') in syslog message, syslog-ng parser returns as TAG field not only TAG field but also some extra strings end with a colon(':') in CONTENT field.
For example, if MSG part is "program abc: message...", syslog-ng returns "program abc" as a TAG field.
According to the section 4.1.3 of RFC3164, a space character can also terminate a TAG field.
The TAG is a string of ABNF alphanumeric characters that MUST NOT exceed 32 characters. Any non-alphanumeric character will terminate the TAG field and will be assumed to be the starting character of the CONTENT field. Most commonly, the first character of the CONTENT field that signifies the conclusion of the TAG field has been seen to be the left square bracket character ("["), a colon character (":"), or a space character.
Is it possible to change this behavior of syslog-ng?
I'm usually reluctant to make such changes as it is very easy to create regressions when changing the log parsing code.
However there were two similar cases in the message parsing code that used different TAG terminator characters, thus I unified the two. (one was using space as separator).
Here's the patch:
http://git.balabit.hu/?p=bazsi/syslog-ng-2.0.git;a=commit;h=4a84d904fe0fc5b3...
tomorrow's snapshot should also contain the change.
As I feared somewhat this change caused a regression for programs that intentionally use "/" in the name of the program. Such an example is "postfix" I've changed this patch, a program name is terminated by either of the following characters: space, '[' and ':'. This still achieves what you originally wanted, but still does not match the RFC as that causes trouble. The two code paths using different characters were also unified to use this three characters. -- Bazsi
On Tue, 2007-10-23 at 18:50 +0200, Balazs Scheidler wrote:
On Mon, 2007-10-22 at 18:26 +0900, Tsurusawa Takeshi wrote:
Hi,
I'm using syslog-ng-2.0.5 and trying to store syslog messages into a DB table with a TAG field column.
But when TAG fileld is terminated with space character(' ') in syslog message, syslog-ng parser returns as TAG field not only TAG field but also some extra strings end with a colon(':') in CONTENT field.
For example, if MSG part is "program abc: message...", syslog-ng returns "program abc" as a TAG field.
According to the section 4.1.3 of RFC3164, a space character can also terminate a TAG field.
The TAG is a string of ABNF alphanumeric characters that MUST NOT exceed 32 characters. Any non-alphanumeric character will terminate the TAG field and will be assumed to be the starting character of the CONTENT field. Most commonly, the first character of the CONTENT field that signifies the conclusion of the TAG field has been seen to be the left square bracket character ("["), a colon character (":"), or a space character.
Is it possible to change this behavior of syslog-ng?
I'm usually reluctant to make such changes as it is very easy to create regressions when changing the log parsing code.
However there were two similar cases in the message parsing code that used different TAG terminator characters, thus I unified the two. (one was using space as separator).
Here's the patch:
http://git.balabit.hu/?p=bazsi/syslog-ng-2.0.git;a=commit;h=4a84d904fe0fc5b3...
tomorrow's snapshot should also contain the change.
As I feared somewhat this change caused a regression for programs that intentionally use "/" in the name of the program. Such an example is "postfix"
I've changed this patch, a program name is terminated by either of the following characters: space, '[' and ':'. This still achieves what you originally wanted, but still does not match the RFC as that causes trouble.
The two code paths using different characters were also unified to use this three characters.
Wow, this is a can of worms. If I use a template of $PRI $DATE $HOST $FACILITY.$PRIORITY $PROGRAM: $MSGONLY it should recreate the entire syslog message, but it will not. The information inside of the [xxx] of the program will be dropped (or will it be part of the MSGONLY? If you want to exclude the [xxx] from the PROGRAM macro then I think that a new macro is required that will contain the [xxx] component. Perhaps INSTANCE or IDENTIFIER or UNIQUE. I don't have the RFC in front of me, but using the terminology that the RFC uses would be good. The I can write a template of $PRI $DATE $HOST $FACILITY.$PRIORITY $PROGRAM[$INSTANCE]: $MSGONLY to recreate the syslog record. An what about if the INSTANCE is not present in the record.... Perhaps we need a conditional template? Currently two destinations with the same endpoint, that use different templates and different filters can be used to accomplish this, but it gets convoluted very quickly. Evan.
On Sat, 2007-12-29 at 10:00 -0800, Evan Rempel wrote:
On Tue, 2007-10-23 at 18:50 +0200, Balazs Scheidler wrote:
On Mon, 2007-10-22 at 18:26 +0900, Tsurusawa Takeshi wrote:
The TAG is a string of ABNF alphanumeric characters that MUST NOT exceed 32 characters. Any non-alphanumeric character will terminate the TAG field and will be assumed to be the starting character of the CONTENT field. Most commonly, the first character of the CONTENT field that signifies the conclusion of the TAG field has been seen to be the left square bracket character ("["), a colon character (":"), or a space character.
Is it possible to change this behavior of syslog-ng?
I'm usually reluctant to make such changes as it is very easy to create regressions when changing the log parsing code.
However there were two similar cases in the message parsing code that used different TAG terminator characters, thus I unified the two. (one was using space as separator).
Here's the patch:
http://git.balabit.hu/?p=bazsi/syslog-ng-2.0.git;a=commit;h=4a84d904fe0fc5b3...
tomorrow's snapshot should also contain the change.
As I feared somewhat this change caused a regression for programs that intentionally use "/" in the name of the program. Such an example is "postfix"
I've changed this patch, a program name is terminated by either of the following characters: space, '[' and ':'. This still achieves what you originally wanted, but still does not match the RFC as that causes trouble.
The two code paths using different characters were also unified to use this three characters.
Wow, this is a can of worms.
If I use a template of
$PRI $DATE $HOST $FACILITY.$PRIORITY $PROGRAM: $MSGONLY
it should recreate the entire syslog message, but it will not. The information inside of the [xxx] of the program will be dropped (or will it be part of the MSGONLY?
No, but if you used: $PRI $DATE $HOST $FACILITY.$PRIORITY $MSG This will contain all of program/pid and message in its original formatting.
If you want to exclude the [xxx] from the PROGRAM macro then I think that a new macro is required that will contain the [xxx] component. Perhaps INSTANCE or IDENTIFIER or UNIQUE.
There's a macro called $PID, but it is not always set as the pid part is optional.
I don't have the RFC in front of me, but using the terminology that the RFC uses would be good. The I can write a template of
$PRI $DATE $HOST $FACILITY.$PRIORITY $PROGRAM[$INSTANCE]: $MSGONLY
to recreate the syslog record.
An what about if the INSTANCE is not present in the record....
$MSG ?
Perhaps we need a conditional template? Currently two destinations with the same endpoint, that use different templates and different filters can be used to accomplish this, but it gets convoluted very quickly.
I don't want to complicate templates() even further. $MSG does the trick IMHO. -- Bazsi
Balazs Scheidler wrote:
Wow, this is a can of worms.
If I use a template of
$PRI $DATE $HOST $FACILITY.$PRIORITY $PROGRAM: $MSGONLY
it should recreate the entire syslog message, but it will not. The information inside of the [xxx] of the program will be dropped (or will it be part of the MSGONLY?
No, but if you used:
$PRI $DATE $HOST $FACILITY.$PRIORITY $MSG
This will contain all of program/pid and message in its original formatting.
If you want to exclude the [xxx] from the PROGRAM macro then I think that a new macro is required that will contain the [xxx] component. Perhaps INSTANCE or IDENTIFIER or UNIQUE.
There's a macro called $PID, but it is not always set as the pid part is optional.
I don't have the RFC in front of me, but using the terminology that the RFC uses would be good. The I can write a template of
$PRI $DATE $HOST $FACILITY.$PRIORITY $PROGRAM[$INSTANCE]: $MSGONLY
to recreate the syslog record.
An what about if the INSTANCE is not present in the record....
$MSG ?
Perhaps we need a conditional template? Currently two destinations with the same endpoint, that use different templates and different filters can be used to accomplish this, but it gets convoluted very quickly.
I don't want to complicate templates() even further. $MSG does the trick IMHO.
MSG is not sufficient because it forces the message, program and PID to be controlled as one piece. My example of recreating the original syslog record was overly simplistic and can be accomplished as you indicate with the MSG expansion. I had forgotten about PID which seems appropriate, PROVIDED it is not required to be numeric. We have a few applications that use the text between the [] as an instance name and is made up of letters and numbers. All of the more complicated examples I can think of are for data mining purposes and as such go through an external program that places the syslog data into a storage engine (database). In all of these cases, external parsing of the program[pid] can be done. IMHO it would be cleaner to parse the message in syslog-ng to create an output stream that has all of the message pieces broken apart DATE HOST FACILITY PRIORITY PROGRAM PID MSGONLY and this seems to have been addressed by the PID, with the one caveate that it must accept non-numeric data. Thanks for jogging my memory. Evan.
On Sun, 2007-12-30 at 08:22 -0800, Evan Rempel wrote:
MSG is not sufficient because it forces the message, program and PID to be controlled as one piece. My example of recreating the original syslog record was overly simplistic and can be accomplished as you indicate with the MSG expansion.
I had forgotten about PID which seems appropriate, PROVIDED it is not required to be numeric. We have a few applications that use the text between the [] as an instance name and is made up of letters and numbers.
All of the more complicated examples I can think of are for data mining purposes and as such go through an external program that places the syslog data into a storage engine (database). In all of these cases, external parsing of the program[pid] can be done. IMHO it would be cleaner to parse the message in syslog-ng to create an output stream that has all of the message pieces broken apart
DATE HOST FACILITY PRIORITY PROGRAM PID MSGONLY
and this seems to have been addressed by the PID, with the one caveate that it must accept non-numeric data.
It does. -- Bazsi
participants (3)
-
Balazs Scheidler
-
Evan Rempel
-
Tsurusawa Takeshi