syslog-ng OSE 3.4 doc updates
Hi, I am currently working on updating the 3.4 adminguide, and releasing updated drafts every few days. The current version already has two short sections about junctions and channels (and a bunch of smaller updates as well). http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guide... http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guide... Any feedback, comment, clarification request, or real-life use-case is most welcome. I hope to release more stuff on Friday, and also to write a list of the new features that are already documented. Kind Regards, Robert
----- Original message -----
Hi,
I am currently working on updating the 3.4 adminguide, and releasing updated drafts every few days. The current version already has two short sections about junctions and channels (and a bunch of smaller updates as well).
yay, this is great news. glad you could make it. some notes below:
http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guide...
* junctions can use flags(final) in order to avoid processing the rest of the channels if one matches * it might make sense to emphatize that if multiple channels process the message, multiple outputs will be generated. * another use case: with junctions, channels and inline drivers, you can associate processing with the source driver, so it automatically emits well formed messages.
http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guide...
ah, I can see that this discusses the use case I was talking about. A referral might help between the two chapters. a bit more step by step tutorial would be useful, junctions are a complex concept.
Any feedback, comment, clarification request, or real-life use-case is most welcome.
I hope to release more stuff on Friday, and also to write a list of the new features that are already documented.
Kind Regards,
Robert
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Thanks for the feedback, yesterday I have released an updated version based on what you told me IRL. Unfortunately, I didn't get to write the list of changes yet. Robi On Thursday, January 24, 2013 12:14 CET, Balazs Scheidler <bazsi77@gmail.com> wrote:
----- Original message -----
Hi,
I am currently working on updating the 3.4 adminguide, and releasing updated drafts every few days. The current version already has two short sections about junctions and channels (and a bunch of smaller updates as well).
yay, this is great news. glad you could make it.
some notes below:
http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guide...
* junctions can use flags(final) in order to avoid processing the rest of the channels if one matches * it might make sense to emphatize that if multiple channels process the message, multiple outputs will be generated. * another use case: with junctions, channels and inline drivers, you can associate processing with the source driver, so it automatically emits well formed messages.
http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guide...
ah, I can see that this discusses the use case I was talking about. A referral might help between the two chapters.
a bit more step by step tutorial would be useful, junctions are a complex concept.
Any feedback, comment, clarification request, or real-life use-case is most welcome.
I hope to release more stuff on Friday, and also to write a list of the new features that are already documented.
Kind Regards,
Robert
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
I published another release today, with minor updates, including the lost of changes that are already documented. You can find it at https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guid... I still have about 20 items on my todo list for the OSE 3.4 docs, I hope to cover them till the end of next week. As always, any feedback or comment is welcome. Regards, Robert On Saturday, January 26, 2013 16:59 CET, Fekete Róbert <frobert@balabit.hu> wrote:
Thanks for the feedback, yesterday I have released an updated version based on what you told me IRL.
Unfortunately, I didn't get to write the list of changes yet.
Robi
On Thursday, January 24, 2013 12:14 CET, Balazs Scheidler <bazsi77@gmail.com> wrote:
----- Original message -----
Hi,
I am currently working on updating the 3.4 adminguide, and releasing updated drafts every few days. The current version already has two short sections about junctions and channels (and a bunch of smaller updates as well).
yay, this is great news. glad you could make it.
some notes below:
http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guide...
* junctions can use flags(final) in order to avoid processing the rest of the channels if one matches * it might make sense to emphatize that if multiple channels process the message, multiple outputs will be generated. * another use case: with junctions, channels and inline drivers, you can associate processing with the source driver, so it automatically emits well formed messages.
http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guide...
ah, I can see that this discusses the use case I was talking about. A referral might help between the two chapters.
a bit more step by step tutorial would be useful, junctions are a complex concept.
Any feedback, comment, clarification request, or real-life use-case is most welcome.
I hope to release more stuff on Friday, and also to write a list of the new features that are already documented.
Kind Regards,
Robert
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
----- Original message -----
I published another release today, with minor updates, including the lost of changes that are already documented.
I was wondering what lost meant in this context. Then I realized it was 'list of changes' You can find it at
https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guid...
wow, that's a really long list. thanks for the updates.
I have a very simple rewrite rule, which just figures out the short hostname and populates a macro SHORTHOST with the short host name. # --- to produce a short host macro SHOST filter f_short_host_at { match('^[^@]+@([^.]+)\.' value("HOST") type(pcre) flags("store-matches" "nobackref")); }; filter f_short_host { match('^([^.@]+)\.' value("HOST") type(pcre) flags("store-matches" "nobackref")); }; rewrite r_short_host { set("$1", value("SHORTHOST") condition(filter(f_short_host_at) or filter(f_short_host) ) ); }; I have two different config files (they are complicated, but the rewrite portion is not). log { source(unix_network_tcp); source(unix_network_udp); rewrite(r_short_host); log { destination(d_archive); flags(flow-control); }; }; In one config everything works as expected (-Fdv output) Syslog connection accepted; fd='20', client='AF_INET(142.104.141.3:34573)', local='AF_INET(142.104.141.3:514)' Incoming log entry; line='<134>2013-02-04T15:28:46-08:00 pangolin.comp.uvic.ca/pangolin.comp.uvic.ca action-handler[24020]: starting' Filter node evaluation result; result='not-match' Filter node evaluation result; result='not-match', type='filter(f_short_host_at)' Filter node evaluation result; result='match' Filter node evaluation result; result='match', type='filter(f_short_host)' Filter node evaluation result; result='match', type='OR' Rewrite expression evaluation result; value='SHORTHOST', new_value='pangolin', rule='r_short_host', location='/usr/local/etc/syslog-ng/syslog-ng.server.conf:173:2' On the other config (same host and all versions of software) Syslog connection accepted; fd='19', client='AF_INET(142.104.141.3:46021)', local='AF_INET(142.104.141.3:514)' Incoming log entry; line='<134>2013-02-04T15:28:46-08:00 pangolin.comp.uvic.ca/pangolin.comp.uvic.ca action-handler[24020]: starting' Filter node evaluation result; result='not-match' Filter node evaluation result; result='not-match', type='filter(f_short_host_at)' ** ERROR:logmsg.c:535:log_msg_set_value_indirect: assertion failed: (!log_msg_is_write_protected(self)) and syslog-ng dies. Can anyone shed any light on this? Under what conditions does the log_msg become write protected? Evan.
hi, write protection is an internal property of log messages, and certainly the assertion should not fail. as you probably know, whenever a message is delivered in multiple paths (several log statements for instance), changes on one of the paths shouldn't be visible on the other. however, syslog-ng also tries to minimize performance impacts of these branches, and only copy the message at the branching point if necessary. if such copying doesn't happen, the message becomes write protected. this ensures that programming errors are not causing the message model to be altered. as it seems that processing the rewrite condition() doesn't properly handle this, thus the error. this is only speculation now, I only read stuff you posted, my phone (where I'm posting from) is not good enough for hacking syslog-ng :) ----- Original message -----
I have a very simple rewrite rule, which just figures out the short hostname and populates a macro SHORTHOST with the short host name.
# --- to produce a short host macro SHOST filter f_short_host_at { match('^[^@]+@([^.]+)\.' value("HOST") type(pcre) flags("store-matches" "nobackref")); }; filter f_short_host { match('^([^.@]+)\.' value("HOST") type(pcre) flags("store-matches" "nobackref")); };
rewrite r_short_host { set("$1", value("SHORTHOST") condition(filter(f_short_host_at) or filter(f_short_host) ) ); };
I have two different config files (they are complicated, but the rewrite portion is not).
log { source(unix_network_tcp); source(unix_network_udp); rewrite(r_short_host); log { destination(d_archive); flags(flow-control); }; };
In one config everything works as expected (-Fdv output)
Syslog connection accepted; fd='20', client='AF_INET(142.104.141.3:34573)', local='AF_INET(142.104.141.3:514)' Incoming log entry; line='<134>2013-02-04T15:28:46-08:00 pangolin.comp.uvic.ca/pangolin.comp.uvic.ca action-handler[24020]: starting' Filter node evaluation result; result='not-match' Filter node evaluation result; result='not-match', type='filter(f_short_host_at)' Filter node evaluation result; result='match' Filter node evaluation result; result='match', type='filter(f_short_host)' Filter node evaluation result; result='match', type='OR' Rewrite expression evaluation result; value='SHORTHOST', new_value='pangolin', rule='r_short_host', location='/usr/local/etc/syslog-ng/syslog-ng.server.conf:173:2'
On the other config (same host and all versions of software)
Syslog connection accepted; fd='19', client='AF_INET(142.104.141.3:46021)', local='AF_INET(142.104.141.3:514)' Incoming log entry; line='<134>2013-02-04T15:28:46-08:00 pangolin.comp.uvic.ca/pangolin.comp.uvic.ca action-handler[24020]: starting' Filter node evaluation result; result='not-match' Filter node evaluation result; result='not-match', type='filter(f_short_host_at)' ** ERROR:logmsg.c:535:log_msg_set_value_indirect: assertion failed: (!log_msg_is_write_protected(self))
and syslog-ng dies.
Can anyone shed any light on this?
Under what conditions does the log_msg become write protected?
Evan. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Back in December there was discussion of the payload_reallocs statistic. https://lists.balabit.hu/pipermail/syslog-ng/2012-December/019842.html
*global;payload_reallocs;;a;processed;760*
this counts the number of reallocs of the message payload. syslog-ng sizes the allocated buffer with a simple heuristics in the hope that parsing, rewrite rules will not cause it to grow. in your case syslog-ng had to do a realloc for 760 messages. if this happens to be close to all messages you processed, it's the cause for performance degradation.
if it's a minority then you probably don't have to care.
if the first one is true, I'd like to know about it.
right now the allocated size is twice the length of the incoming message.
Well, You wanted to know if this happens for nearly all of the messages required realloc Syslog-ng OSE 3.3.7 The d_archive destination receives all of our messages; global;payload_reallocs;;a;processed;61142004 destination;d_archive;;a;processed;31650382 about 15 seconds later global;payload_reallocs;;a;processed;61197495 destination;d_archive;;a;processed;31680143 This means that for # messages = 29761 # reallocs = 55491 or approximately 2 reallocs for each message. We make heavy use of patternDB to apply meta data to messages,
hi, in that case it might make sense to add a parameter to control initial allocation size or create a heuristic to automatically adjust that. did you notice performance issues? how many messages are you processing? did the cpu usage of syslog-ng increase dramatically at one point as you were adding more and more name-value pairs? thanks for the info. ----- Original message -----
Back in December there was discussion of the payload_reallocs statistic.
https://lists.balabit.hu/pipermail/syslog-ng/2012-December/019842.html
> >> *global;payload_reallocs;;a;processed;760* > >this counts the number of reallocs of the message payload. syslog-ng sizes the allocated buffer >with a simple heuristics in the hope that parsing, rewrite rules will not cause it to grow. >in your case syslog-ng had to do a realloc for 760 messages. if this happens to be close to >all messages you processed, it's the cause for performance degradation. > >if it's a minority then you probably don't have to care. > >if the first one is true, I'd like to know about it. > >right now the allocated size is twice the length of the incoming message. >
Well, You wanted to know if this happens for nearly all of the messages required realloc
Syslog-ng OSE 3.3.7
The d_archive destination receives all of our messages;
global;payload_reallocs;;a;processed;61142004 destination;d_archive;;a;processed;31650382
about 15 seconds later
global;payload_reallocs;;a;processed;61197495 destination;d_archive;;a;processed;31680143
This means that for
# messages = 29761 # reallocs = 55491
or approximately 2 reallocs for each message.
We make heavy use of patternDB to apply meta data to messages,
No, we didn't really notice a performance issue. We process approx 3,000 msg/sec We keep detailed statistics on our hosts, and the CPU change could not really be noticed. Of course, we are running on an 8 core system, so the CPU may have increased for the one core, but it would only have 1/8 of an affect on the total CPU % used, so it would be difficult to notice. We are seeing a different problem though. (happens on both 3.3.7 and 3.4.1 threaded and not threaded) 2013-02-06T23:59:05-08:00 kern.info kernel: syslog-ng[10913]: segfault at 7f819c000168 ip 00007f819c000168 sp 00007f81b33f5a48 error 15 2013-02-06T23:59:05-08:00 syslog.notice syslog-ng[7627]: Syslog connection closed; fd='13', client='AF_INET(142.104.47.145:51679)', local='AF_INET(127.0.0.1:1514)' 2013-02-06T23:59:05-08:00 daemon.crit supervise/syslog-ng[18771]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11' This happens approx once every 4-5 days. We take ps snapshots every 15 minutes, so the process looked like; Date Time USER PID PPID PRI CPU VSZ ELAPSED TIME COMMAND ------------------------------------------------------------------------------------------ 20130206 083101 root 18772 18771 19 - 686668 00:03:28 00:00:24 /usr/local/sbin/syslog-ng ... 20130206 234601 root 18772 18771 19 - 684056 15:18:28 01:52:08 /usr/local/sbin/syslog-ng ... then the restarted process 20130207 000101 root 11019 18771 19 - 687628 00:01:56 00:00:09 /usr/local/sbin/syslog-ng So there does not seem to be a memory leak, but obviously something goes wrong to get a segfault. I can't trace this for 4-5 days, so how do we trouble shoot this? Evan. On 02/06/2013 09:47 PM, Balazs Scheidler wrote:
hi,
in that case it might make sense to add a parameter to control initial allocation size or create a heuristic to automatically adjust that.
did you notice performance issues? how many messages are you processing? did the cpu usage of syslog-ng increase dramatically at one point as you were adding more and more name-value pairs?
thanks for the info.
----- Original message -----
Back in December there was discussion of the payload_reallocs statistic.
https://lists.balabit.hu/pipermail/syslog-ng/2012-December/019842.html
*global;payload_reallocs;;a;processed;760*
this counts the number of reallocs of the message payload. syslog-ng
sizes the allocated buffer >with a simple heuristics in the hope that parsing, rewrite rules will not cause it to grow. >in your case syslog-ng had to do a realloc for 760 messages. if this happens to be close to >all messages you processed, it's the cause for performance degradation. > >if it's a minority then you probably don't have to care.
if the first one is true, I'd like to know about it.
right now the allocated size is twice the length of the incoming
message. >
Well, You wanted to know if this happens for nearly all of the messages required realloc
Syslog-ng OSE 3.3.7
The d_archive destination receives all of our messages;
global;payload_reallocs;;a;processed;61142004 destination;d_archive;;a;processed;31650382
about 15 seconds later
global;payload_reallocs;;a;processed;61197495 destination;d_archive;;a;processed;31680143
This means that for
# messages = 29761 # reallocs = 55491
or approximately 2 reallocs for each message.
We make heavy use of patternDB to apply meta data to messages,
-- Evan Rempel erempel@uvic.ca Senior Systems Administrator 250.721.7691 Data Centre Services, University Systems, University of Victoria
Forgot to ask ... what is the algorithm used to determine the resized size? On 02/07/2013 11:35 AM, Evan Rempel wrote:
No, we didn't really notice a performance issue. We process approx 3,000 msg/sec We keep detailed statistics on our hosts, and the CPU change could not really be noticed. Of course, we are running on an 8 core system, so the CPU may have increased for the one core, but it would only have 1/8 of an affect on the total CPU % used, so it would be difficult to notice.
We are seeing a different problem though. (happens on both 3.3.7 and 3.4.1 threaded and not threaded)
2013-02-06T23:59:05-08:00 kern.info kernel: syslog-ng[10913]: segfault at 7f819c000168 ip 00007f819c000168 sp 00007f81b33f5a48 error 15 2013-02-06T23:59:05-08:00 syslog.notice syslog-ng[7627]: Syslog connection closed; fd='13', client='AF_INET(142.104.47.145:51679)', local='AF_INET(127.0.0.1:1514)' 2013-02-06T23:59:05-08:00 daemon.crit supervise/syslog-ng[18771]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11'
This happens approx once every 4-5 days.
We take ps snapshots every 15 minutes, so the process looked like;
Date Time USER PID PPID PRI CPU VSZ ELAPSED TIME COMMAND ------------------------------------------------------------------------------------------ 20130206 083101 root 18772 18771 19 - 686668 00:03:28 00:00:24 /usr/local/sbin/syslog-ng ... 20130206 234601 root 18772 18771 19 - 684056 15:18:28 01:52:08 /usr/local/sbin/syslog-ng ... then the restarted process 20130207 000101 root 11019 18771 19 - 687628 00:01:56 00:00:09 /usr/local/sbin/syslog-ng
So there does not seem to be a memory leak, but obviously something goes wrong to get a segfault.
I can't trace this for 4-5 days, so how do we trouble shoot this?
Evan.
On 02/06/2013 09:47 PM, Balazs Scheidler wrote:
hi,
in that case it might make sense to add a parameter to control initial allocation size or create a heuristic to automatically adjust that.
did you notice performance issues? how many messages are you processing? did the cpu usage of syslog-ng increase dramatically at one point as you were adding more and more name-value pairs?
thanks for the info.
----- Original message -----
Back in December there was discussion of the payload_reallocs statistic.
https://lists.balabit.hu/pipermail/syslog-ng/2012-December/019842.html
*global;payload_reallocs;;a;processed;760*
this counts the number of reallocs of the message payload. syslog-ng
sizes the allocated buffer >with a simple heuristics in the hope that parsing, rewrite rules will not cause it to grow. >in your case syslog-ng had to do a realloc for 760 messages. if this happens to be close to >all messages you processed, it's the cause for performance degradation. > >if it's a minority then you probably don't have to care.
if the first one is true, I'd like to know about it.
right now the allocated size is twice the length of the incoming
message. >
Well, You wanted to know if this happens for nearly all of the messages required realloc
Syslog-ng OSE 3.3.7
The d_archive destination receives all of our messages;
global;payload_reallocs;;a;processed;61142004 destination;d_archive;;a;processed;31650382
about 15 seconds later
global;payload_reallocs;;a;processed;61197495 destination;d_archive;;a;processed;31680143
This means that for
# messages = 29761 # reallocs = 55491
or approximately 2 reallocs for each message.
We make heavy use of patternDB to apply meta data to messages,
-- Evan Rempel erempel@uvic.ca Senior Systems Administrator 250.721.7691 Data Centre Services, University Systems, University of Victoria
Evan Rempel <erempel@uvic.ca> writes:
We are seeing a different problem though. (happens on both 3.3.7 and 3.4.1 threaded and not threaded)
2013-02-06T23:59:05-08:00 kern.info kernel: syslog-ng[10913]: segfault at 7f819c000168 ip 00007f819c000168 sp 00007f81b33f5a48 error 15 2013-02-06T23:59:05-08:00 syslog.notice syslog-ng[7627]: Syslog connection closed; fd='13', client='AF_INET(142.104.47.145:51679)', local='AF_INET(127.0.0.1:1514)' 2013-02-06T23:59:05-08:00 daemon.crit supervise/syslog-ng[18771]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11' [...] So there does not seem to be a memory leak, but obviously something goes wrong to get a segfault.
I can't trace this for 4-5 days, so how do we trouble shoot this?
If you could enable core dumps, and get a backtrace, that would help a lot to narrow down the issue. -- |8]
Normally when a syslog line is produced, the host has the format of {source}@{hostname} so when the log reaches my central server it looks like 2013-02-08T11:15:01-08:00 local@gpfs10.westgrid.uvic.ca/chrysaor.westgrid.ca cron.info CROND[20315]: ... but on this same host, I have a file source (different source definition), its messages go to the same destination using a separate log statement, but when they reach the central syslog server it looks like 2013-02-08T11:11:35-08:00 gpfs10.westgrid.uvic.ca/gpfs10.westgrid.uvic.ca/chrysaor.westgrid.ca local2.info mmfs: ... So it seems that the file source is populating the host with {hostname}/{hostname} Was this intentional? source mmfs { file("/var/adm/ras/mmfs.log.latest" log_fetch_limit(100) program_override(mmfs) default-facility(local2) default-priority(info) flags(no-parse) ); };
hi, the default hostname if otherwise unspecified is using this format if chain_hostnames() is enabled. this mimics the behaviour of chain_hostnames() when receiving the message locally. (the part before the slash is the host as it claimed itself to be, the part after the slash as it was resolved) I consider the chain_hostnames() functionality to be deprecated, it's not always logical how it behaves, but this is how it worked for the past decade. ----- Original message -----
Normally when a syslog line is produced, the host has the format of
{source}@{hostname}
so when the log reaches my central server it looks like
2013-02-08T11:15:01-08:00 local@gpfs10.westgrid.uvic.ca/chrysaor.westgrid.ca cron.info CROND[20315]: ...
but on this same host, I have a file source (different source definition), its messages go to the same destination using a separate log statement, but when they reach the central syslog server it looks like
2013-02-08T11:11:35-08:00 gpfs10.westgrid.uvic.ca/gpfs10.westgrid.uvic.ca/chrysaor.westgrid.ca local2.info mmfs: ...
So it seems that the file source is populating the host with {hostname}/{hostname}
Was this intentional?
source mmfs { file("/var/adm/ras/mmfs.log.latest" log_fetch_limit(100) program_override(mmfs) default-facility(local2) default-priority(info) flags(no-parse) ); };
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Just so I understand, you are saying that file sources are treated as if they were another syslog instance on the same host, sending data to the running instance. Correct? If you consider chain_hostnames() deprecated, what do you recommend now? If it isn't logical how it behaves, perhaps it should be fixed so that it is logical. :-) Evan. ________________________________________ From: Balazs Scheidler [bazsi77@gmail.com] Sent: Saturday, February 09, 2013 10:19 PM To: Syslog-ng users' and developers' mailing list; Evan Rempel Subject: Re: [syslog-ng] 3.3.7 oddity with file source hi, the default hostname if otherwise unspecified is using this format if chain_hostnames() is enabled. this mimics the behaviour of chain_hostnames() when receiving the message locally. (the part before the slash is the host as it claimed itself to be, the part after the slash as it was resolved) I consider the chain_hostnames() functionality to be deprecated, it's not always logical how it behaves, but this is how it worked for the past decade. ----- Original message -----
Normally when a syslog line is produced, the host has the format of
{source}@{hostname}
so when the log reaches my central server it looks like
2013-02-08T11:15:01-08:00 local@gpfs10.westgrid.uvic.ca<mailto:local@gpfs10.westgrid.uvic.ca>/chrysaor.westgrid.ca cron.info CROND[20315]: ...
but on this same host, I have a file source (different source definition), its messages go to the same destination using a separate log statement, but when they reach the central syslog server it looks like
2013-02-08T11:11:35-08:00 gpfs10.westgrid.uvic.ca/gpfs10.westgrid.uvic.ca/chrysaor.westgrid.ca local2.info mmfs: ...
So it seems that the file source is populating the host with {hostname}/{hostname}
Was this intentional?
source mmfs { file("/var/adm/ras/mmfs.log.latest" log_fetch_limit(100) program_override(mmfs) default-facility(local2) default-priority(info) flags(no-parse) ); };
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
On 02/08/2013 03:52 AM, Gergely Nagy wrote:
Evan Rempel <erempel@uvic.ca> writes:
We are seeing a different problem though. (happens on both 3.3.7 and 3.4.1 threaded and not threaded)
2013-02-06T23:59:05-08:00 kern.info kernel: syslog-ng[10913]: segfault at 7f819c000168 ip 00007f819c000168 sp 00007f81b33f5a48 error 15 2013-02-06T23:59:05-08:00 syslog.notice syslog-ng[7627]: Syslog connection closed; fd='13', client='AF_INET(142.104.47.145:51679)', local='AF_INET(127.0.0.1:1514)' 2013-02-06T23:59:05-08:00 daemon.crit supervise/syslog-ng[18771]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11' [...] So there does not seem to be a memory leak, but obviously something goes wrong to get a segfault.
I can't trace this for 4-5 days, so how do we trouble shoot this?
If you could enable core dumps, and get a backtrace, that would help a lot to narrow down the issue.
OK, I was able to capture a core dump and the backtrace looks like, % sudo gdb /usr/local/sbin/syslog-ng core.11481 Reading symbols from /usr/local/sbin/syslog-ng...(no debugging symbols found)...done. [New Thread 11954] [New Thread 12168] [New Thread 12185] [New Thread 12184] [New Thread 12060] [New Thread 11481] ... Core was generated by `/usr/local/sbin/syslog-ng --cfgfile=/usr/local/etc/syslog-ng/syslog-ng.server.c'. Program terminated with signal 11, Segmentation fault. #0 0x00007f40080008b0 in ?? () (gdb) backtrace #0 0x00007f40080008b0 in ?? () #1 0x00007f402a10b1a5 in log_msg_refcache_stop () at logmsg.c:1495 #2 0x00007f402a11760c in log_writer_flush (self=0x9b3970, flush_mode=LW_FLUSH_BUFFER) at logwriter.c:1043 #3 0x00007f402a1177ed in log_writer_work_perform (s=0x9b3970) at logwriter.c:129 #4 0x00007f402a117edb in main_loop_io_worker_job_start (self=0x9b3b60) at mainloop.c:371 #5 0x00007f402a1341ca in iv_work_thread_do_work (_thr=0x70aa30) at iv_work.c:118 #6 0x00007f402a13352a in iv_run_tasks (st=0x7f40180dd200) at iv_task.c:48 #7 0x00007f402a13574c in iv_main () at iv_main_posix.c:106 #8 0x00007f402a133fe1 in iv_work_thread (_thr=0x70aa30) at iv_work.c:200 #9 0x00007f402a1361b8 in iv_thread_handler (_thr=0x940ae0) at iv_thread_posix.c:142 #10 0x00007f4028d82851 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f4028ad011d in clone () from /lib64/libc.so.6 (gdb) -- Evan Rempel erempel@uvic.ca Senior Systems Administrator 250.721.7691 Data Centre Services, University Systems, University of Victoria
Evan Rempel <erempel@uvic.ca> writes:
Core was generated by `/usr/local/sbin/syslog-ng --cfgfile=/usr/local/etc/syslog-ng/syslog-ng.server.c'. Program terminated with signal 11, Segmentation fault. #0 0x00007f40080008b0 in ?? () (gdb) backtrace #0 0x00007f40080008b0 in ?? () #1 0x00007f402a10b1a5 in log_msg_refcache_stop () at logmsg.c:1495 #2 0x00007f402a11760c in log_writer_flush (self=0x9b3970, flush_mode=LW_FLUSH_BUFFER) at logwriter.c:1043 #3 0x00007f402a1177ed in log_writer_work_perform (s=0x9b3970) at logwriter.c:129 #4 0x00007f402a117edb in main_loop_io_worker_job_start (self=0x9b3b60) at mainloop.c:371 #5 0x00007f402a1341ca in iv_work_thread_do_work (_thr=0x70aa30) at iv_work.c:118 #6 0x00007f402a13352a in iv_run_tasks (st=0x7f40180dd200) at iv_task.c:48 #7 0x00007f402a13574c in iv_main () at iv_main_posix.c:106 #8 0x00007f402a133fe1 in iv_work_thread (_thr=0x70aa30) at iv_work.c:200 #9 0x00007f402a1361b8 in iv_thread_handler (_thr=0x940ae0) at iv_thread_posix.c:142 #10 0x00007f4028d82851 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f4028ad011d in clone () from /lib64/libc.so.6 (gdb)
Thanks, this narrows done things a little, I'll see if I can figure out anything from the code. Thanks for the backtrace! -- |8]
If it helps, some times the flush_mode is LW_FLUSH_NORMAL and other times LW_FLUSH_BUFFER #0 0x00007f4020000528 in ?? () #1 0x00007f402a10b1a5 in log_msg_refcache_stop () at logmsg.c:1495 #2 0x00007f402a11760c in log_writer_flush (self=0xa0d460, flush_mode=LW_FLUSH_NORMAL) at logwriter.c:1043 #3 0x00007f402a1177ed in log_writer_work_perform (s=0xa0d460) at logwriter.c:129 #4 0x00007f402a117edb in main_loop_io_worker_job_start (self=0xa0d650) at mainloop.c:371 #5 0x00007f402a1341ca in iv_work_thread_do_work (_thr=0xabdd20) at iv_work.c:118 #6 0x00007f402a13352a in iv_run_tasks (st=0x7f400c40a130) at iv_task.c:48 #7 0x00007f402a13574c in iv_main () at iv_main_posix.c:106 #8 0x00007f402a133fe1 in iv_work_thread (_thr=0xabdd20) at iv_work.c:200 #9 0x00007f402a1361b8 in iv_thread_handler (_thr=0xabddf0) at iv_thread_posix.c:142 #10 0x00007f4028d82851 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f4028ad011d in clone () from /lib64/libc.so.6 On 02/22/2013 08:18 AM, Evan Rempel wrote:
On 02/08/2013 03:52 AM, Gergely Nagy wrote:
Evan Rempel <erempel@uvic.ca> writes:
We are seeing a different problem though. (happens on both 3.3.7 and 3.4.1 threaded and not threaded)
2013-02-06T23:59:05-08:00 kern.info kernel: syslog-ng[10913]: segfault at 7f819c000168 ip 00007f819c000168 sp 00007f81b33f5a48 error 15 2013-02-06T23:59:05-08:00 syslog.notice syslog-ng[7627]: Syslog connection closed; fd='13', client='AF_INET(142.104.47.145:51679)', local='AF_INET(127.0.0.1:1514)' 2013-02-06T23:59:05-08:00 daemon.crit supervise/syslog-ng[18771]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11' [...] So there does not seem to be a memory leak, but obviously something goes wrong to get a segfault.
I can't trace this for 4-5 days, so how do we trouble shoot this?
If you could enable core dumps, and get a backtrace, that would help a lot to narrow down the issue.
OK, I was able to capture a core dump and the backtrace looks like,
% sudo gdb /usr/local/sbin/syslog-ng core.11481
Reading symbols from /usr/local/sbin/syslog-ng...(no debugging symbols found)...done. [New Thread 11954] [New Thread 12168] [New Thread 12185] [New Thread 12184] [New Thread 12060] [New Thread 11481] ... Core was generated by `/usr/local/sbin/syslog-ng --cfgfile=/usr/local/etc/syslog-ng/syslog-ng.server.c'. Program terminated with signal 11, Segmentation fault. #0 0x00007f40080008b0 in ?? () (gdb) backtrace #0 0x00007f40080008b0 in ?? () #1 0x00007f402a10b1a5 in log_msg_refcache_stop () at logmsg.c:1495 #2 0x00007f402a11760c in log_writer_flush (self=0x9b3970, flush_mode=LW_FLUSH_BUFFER) at logwriter.c:1043 #3 0x00007f402a1177ed in log_writer_work_perform (s=0x9b3970) at logwriter.c:129 #4 0x00007f402a117edb in main_loop_io_worker_job_start (self=0x9b3b60) at mainloop.c:371 #5 0x00007f402a1341ca in iv_work_thread_do_work (_thr=0x70aa30) at iv_work.c:118 #6 0x00007f402a13352a in iv_run_tasks (st=0x7f40180dd200) at iv_task.c:48 #7 0x00007f402a13574c in iv_main () at iv_main_posix.c:106 #8 0x00007f402a133fe1 in iv_work_thread (_thr=0x70aa30) at iv_work.c:200 #9 0x00007f402a1361b8 in iv_thread_handler (_thr=0x940ae0) at iv_thread_posix.c:142 #10 0x00007f4028d82851 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f4028ad011d in clone () from /lib64/libc.so.6 (gdb)
-- Evan Rempel erempel@uvic.ca Senior Systems Administrator 250.721.7691 Data Centre Services, University Systems, University of Victoria
participants (5)
-
Balazs Scheidler
-
Evan Rempel
-
Fekete Robert
-
Fekete Róbert
-
Gergely Nagy