Hi, We are running into a issue where syslog-ng clients who are configured to forward to a central syslog-ng destination server that are reasonably chatty are getting there messages delayed at the source (ie. 'tcpdump'ing the client shows it is sending messages from N minutes ago, where N is anything from 5 to 60 minutes on average sometimes more). We have also seen a small percentage of loss at the destination. On these 'delayed' clients we also see the resident memory footprint on the growing. Clients are running 1.6.12 (have tried 2.0.8 also - same symptons). The central syslog-ng server is running 2.0.8. relevant config entries are: syslog-ng client: options { sync (1); time_reopen (10); log_fifo_size (2097152); long_hostnames (off); use_dns (no); use_fqdn (no); create_dirs (no); keep_hostname (yes); }; destination d_central { tcp("log2.xyz.com" port(5140)); }; filter f_local5 { facility(local5); }; log { source(s_sys); filter(f_local5); destination(d_central); flags(final); }; syslog-ng central server: options { time_reopen (10); log_fifo_size (2097152); long_hostnames (off); use_dns (yes); dns_cache (yes); dns_cache_expire (28800); use_fqdn (no); create_dirs (yes); dir_perm (0755); perm (0644); keep_hostname (no); use_time_recvd (no); stats_freq(600); flush_timeout(16000); flush_lines(256); time_sleep(80); }; destination d_server { file("/logs/server.$LEVEL.log"); }; filter f_server { facility(local5) and program("server"); }; log { source(s_sys); filter(f_server); destination(d_server); flags(final); }; to give a rough idea of message volume the server is receiving approximately 50 million log lines per day on average any help/information would be appreciated thanks -------------------------------------- Easy + Joy + Powerful = Yahoo! Bookmarks x Toolbar http://pr.mail.yahoo.co.jp/toolbar/
mark peters wrote:
Hi,
We are running into a issue where syslog-ng clients who are configured to forward to a central syslog-ng destination server that are reasonably chatty are getting there messages delayed at the source (ie. 'tcpdump'ing the client shows it is sending messages from N minutes ago, where N is anything from 5 to 60 minutes on average sometimes more). We have also seen a small percentage of loss at the destination.
Some applications syslog with an incorrect time. We are currently working with sendmail logging some messages exactly 1 hour old. They log it at the correct time, but with an hour old timestamp. We saw open ssh log with a time of 7 hours in the future (I think it logged UTC native time rather than our time zone). IF you can, try to associate the "wrong time" messages with a process ID or something that you can confirm is logging the messages at the correct time. By ignoring the time on the syslog messages, you can see that the cronlogy of the messages is correct, then you can deduce that the application is logging the wrong time. Evan Rempel
On Thu, 2008-03-27 at 09:08 -0700, Evan Rempel wrote:
mark peters wrote:
Hi,
We are running into a issue where syslog-ng clients who are configured to forward to a central syslog-ng destination server that are reasonably chatty are getting there messages delayed at the source (ie. 'tcpdump'ing the client shows it is sending messages from N minutes ago, where N is anything from 5 to 60 minutes on average sometimes more). We have also seen a small percentage of loss at the destination.
Some applications syslog with an incorrect time. We are currently working with sendmail logging some messages exactly 1 hour old. They log it at the correct time, but with an hour old timestamp.
We saw open ssh log with a time of 7 hours in the future (I think it logged UTC native time rather than our time zone).
IF you can, try to associate the "wrong time" messages with a process ID or something that you can confirm is logging the messages at the correct time. By ignoring the time on the syslog messages, you can see that the cronlogy of the messages is correct, then you can deduce that the application is logging the wrong time.
If this is indeed true, you could perhaps use the received timestamp instead of the one in the message. This way you don't have to trust the time generated by applications. -- Bazsi
We have been running a syslog-ng 2.0.x syslog server for a long time now, collecting messages from a few hundred systems. We have been very happy that the CPU for that process was running at less than 5%. We are getting a version 3.0.5 system ready for deployment, and as a test, have configured our 2.0.x system to copy all of its logs to the 3.0.5 server. We have basically used the same configuration file from out 2.0.x server with minimal changes to make it a "@version: 3.0" config file. What I see as the major differences in this setup are: 1. New server is 64 bit Redhat 5 linux and old was 32bit Redhat 4. 2. The new servers sees one TCP connection with the full syslog stream where the old server has approx 300 TCP connections with only a few messages on each stream. Now for the bad stuff. On the Syslog-ng 3.0.5 system the process causes 85% CPU load (during light load periods) and most (80% of the 85%) of that is system time, not user time. Has anyone seen this type of behavior? Can anyone give me some pointers to where this issue might be? I have a few ideas, but I didn't want to "lead the witness" :-) Thanks for your time. Evan.
On Mon, 2010-03-15 at 08:48 -0700, Evan Rempel wrote:
We have been running a syslog-ng 2.0.x syslog server for a long time now, collecting messages from a few hundred systems. We have been very happy that the CPU for that process was running at less than 5%.
We are getting a version 3.0.5 system ready for deployment, and as a test, have configured our 2.0.x system to copy all of its logs to the 3.0.5 server. We have basically used the same configuration file from out 2.0.x server with minimal changes to make it a "@version: 3.0" config file.
What I see as the major differences in this setup are:
1. New server is 64 bit Redhat 5 linux and old was 32bit Redhat 4.
2. The new servers sees one TCP connection with the full syslog stream where the old server has approx 300 TCP connections with only a few messages on each stream.
Now for the bad stuff.
On the Syslog-ng 3.0.5 system the process causes 85% CPU load (during light load periods) and most (80% of the 85%) of that is system time, not user time.
Has anyone seen this type of behavior?
Can anyone give me some pointers to where this issue might be?
I have a few ideas, but I didn't want to "lead the witness" :-)
I know about an issue not yet fixed in 3.0.5 and introduced somewhere between 3.0.2 and 3.0.4. Can you check if you have pipe() source or destination which refers to a non-named-pipe file? e.g. it is typical to see something like pipe('/dev/console') but /dev/console is a character device. The solution is to use the file() driver which is meant to be used in these cases. The patch is really simple: commit 81f27c22e3f6b0f8f6148bdc6e98d1491b80d128 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Fri Mar 19 13:27:39 2010 +0100 affile: fixed a possible infinite loop if pipe driver is used for a non-pipe device In this case the file descriptor representing the pipe is not initialized, but success is returned. This causes the main poll() call to return with a failure condition repeatedly, which in turn tops up the CPU at 100% diff --git a/src/affile.c b/src/affile.c index 15dbee0..41c71f3 100644 --- a/src/affile.c +++ b/src/affile.c @@ -64,6 +64,7 @@ affile_open_file(gchar *name, gint flags, g_process_cap_modify(CAP_DAC_READ_SEARCH, TRUE); g_process_cap_modify(CAP_SYS_ADMIN, TRUE); } + *fd = -1; if (stat(name, &st) >= 0) { if (is_pipe && !S_ISFIFO(st.st_mode)) -- Bazsi
participants (3)
-
Balazs Scheidler
-
Evan Rempel
-
mark peters