I am currently experiencing an odd performance issue with syslog-ng 1.6.5. I use several load-balanced relays which receive syslog traffic via both udp and tcp. They forward all traffic to the back-end collection hosts exclusively via tcp. Overall, this environment collects around 60GB of log data per day. All of the relays have identical Solaris 8 builds, as well as identical syslog-ng configurations. Here is the pertinent snippet from a configuration file: options { stats(300); sync(0); time_reopen(10); log_fifo_size(3000000); gc_busy_threshold(200000); check_hostname(no); keep_hostname(yes); }; As you can see, my fifo queue is enormous. Despite that fact, syslog-ng on any given relay will begin dropping messages at an alarming rate (600-700 per second) when combined inbound and outbound tcp traffic increases beyond a certain threshold. Said threshold seems to be a bit low to me, though. We're talking somewhere in the neighborhood of 700 tcp segments per second. The corresponding inbound UDP dgrams are usually between 200 and 400 per second. It should be noted that the IP stack is handling the load just fine. Also, CPU utilization rarely exceeds 60 or 70 percent. The collection hosts are fairly beefy systems that are barely breathing even during peak load. I've watched both the IP stack and syslog-ng on these systems, and they're keeping up just fine. I know the FIFO queue size probably seems unreasonable here, but the lower I make it, the worse the drops. Of course, the higher I make it, the more memory it eats up. And in some cases, I've seen syslog-ng eventually eat it all up and the scanner kicks in. But what's really most peculiar in this scenario is the fact that the numbers simply don't add up. Why does syslog-ng appear (on the surface) to be dropping a very large percentage of the messages that it receives? I realize that it's not, but the numbers tell a different story. And how could it possibly drop so many messages when the FIFO queue is configured to buffer three million lines? How preposterous! Am I way off base here? --------------------------------- Yahoo! FareChase - Search multiple travel sites in one click.
On Fri, Oct 21, 2005 at 06:54:04AM -0700, Scott C wrote:
But what's really most peculiar in this scenario is the fact that the numbers simply don't add up. Why does syslog-ng appear (on the surface) to be dropping a very large percentage of the messages that it receives? I realize that it's not, but the numbers tell a different story. And how could it possibly drop so many messages when the FIFO queue is configured to buffer three million lines? How preposterous!
So you think you really have all the logs but you see STATS messages reporting dropped messages? How would you know if you really have them all? It's possible that under heavy load you have some program or pipe destination (or maybe even file if you have slow disks) that just can't keep up. That's not syslog-ng's fault, it just lets you know that the buffer filled up. Right now all anyone can do is shoot off wild guesses like mine above, since there's no hard data in your post, just your conclusions. If you want to post your syslog-ng.conf, output of system commands like "netstat -i", prstat, "iostat -mnPxz 10" and vmstat during peak loads, and whatever else you used to reach your conclusions then we'd be in a better position to help. OBTW there are performance tips in the FAQ that give clues as to causes: http://www.campin.net/syslog-ng/faq.html#perf Possible culprits: DNS, regexps (though you say CPU is ok, so maybe not), logging to a tty or the console. -- Nate "I must've seen it in a USENET posting; that's sort of like hearsay evidence from Richard Nixon..." - Houghton, Blair
The log data we receive from most systems is fairly consistent from day to day. That's how I can tell I'm missing some data. But it's not to the extent that syslog-ng STATS would seem to indicate. As for the system-level performance stuff, I don't mean to sound pretentious, but I've effectively ruled all of that out. I've analyzed all the statistics, and I even opened a ticket with Sun to get their two cents worth. They came to the same conclusion that I did--the application is simply not keeping up with the IP stack. Also, although it may not be entirely clear in my original post, these relays are just relays. The log messages come in via udp or tcp and go right back out via tcp. Any and all processing (including dns and regexp) is performed at the back end. At the end of the day, I guess my real question is, how truly accurate is the STATS number that I'm seeing? Here is my syslog-ng.conf: options { stats(300); sync(0); time_reopen(10); log_fifo_size(3000000); gc_busy_threshold(200000); check_hostname(no); keep_hostname(yes); }; # Grab syslog-ng specific messages. source ng_local { internal(); }; # Grab local messages (OS specific messages). source local { sun-streams("/dev/log" door("/etc/.syslog_door")); }; source remote { udp(ip("1.1.1.1") port(514)); tcp(ip("1.1.1.1") port(5514) max-connections(50)); }; # Log location for syslog-ng specific messages. destination ng_messages { file("/var/log/syslog-ng" owner(root) group(staff) perm(0740) create_dirs(yes) dir_perm(0755)); }; # Log location for OS specific messages. destination local_messages { file("/var/adm/messages" owner(root) group(staff) create_dirs(yes) dir_perm(0755)); }; # Forward messages to the cluster destination remote_messages { tcp("1.1.1.50" port(5514)); }; # Log all syslog-ng specific messages. log { source(ng_local); destination(ng_messages); }; # Log all OS specific messages. log { source(local); destination(local_messages); }; # Log all remote system messages. log { source(remote); destination(remote_messages); }; Nate Campi <nate@campin.net> wrote: On Fri, Oct 21, 2005 at 06:54:04AM -0700, Scott C wrote:
But what's really most peculiar in this scenario is the fact that the numbers simply don't add up. Why does syslog-ng appear (on the surface) to be dropping a very large percentage of the messages that it receives? I realize that it's not, but the numbers tell a different story. And how could it possibly drop so many messages when the FIFO queue is configured to buffer three million lines? How preposterous!
So you think you really have all the logs but you see STATS messages reporting dropped messages? How would you know if you really have them all? It's possible that under heavy load you have some program or pipe destination (or maybe even file if you have slow disks) that just can't keep up. That's not syslog-ng's fault, it just lets you know that the buffer filled up. Right now all anyone can do is shoot off wild guesses like mine above, since there's no hard data in your post, just your conclusions. If you want to post your syslog-ng.conf, output of system commands like "netstat -i", prstat, "iostat -mnPxz 10" and vmstat during peak loads, and whatever else you used to reach your conclusions then we'd be in a better position to help. OBTW there are performance tips in the FAQ that give clues as to causes: http://www.campin.net/syslog-ng/faq.html#perf Possible culprits: DNS, regexps (though you say CPU is ok, so maybe not), logging to a tty or the console. -- Nate "I must've seen it in a USENET posting; that's sort of like hearsay evidence from Richard Nixon..." - Houghton, Blair _______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html --------------------------------- Yahoo! FareChase - Search multiple travel sites in one click.
On Fri, 2005-10-21 at 08:47 -0700, Scott C wrote:
The log data we receive from most systems is fairly consistent from day to day. That's how I can tell I'm missing some data. But it's not to the extent that syslog-ng STATS would seem to indicate.
As for the system-level performance stuff, I don't mean to sound pretentious, but I've effectively ruled all of that out. I've analyzed all the statistics, and I even opened a ticket with Sun to get their two cents worth. They came to the same conclusion that I did--the application is simply not keeping up with the IP stack.
Also, although it may not be entirely clear in my original post, these relays are just relays. The log messages come in via udp or tcp and go right back out via tcp. Any and all processing (including dns and regexp) is performed at the back end.
At the end of the day, I guess my real question is, how truly accurate is the STATS number that I'm seeing?
Judging your configuration files seem to indicate that the tcp destination cannot keep up, your files should definitely be ok. The number of DROPPED entries is STATS is a sum of all destinations, if you have three destinations then it is equal to three messages. I always wanted to do per-destination statistics. That would now be possible with syslog-ng 1.9.x, but that's not yet ready for production. -- Bazsi
On Fri, 2005-10-21 at 17:51 +0200, Balazs Scheidler wrote:
On Fri, 2005-10-21 at 08:47 -0700, Scott C wrote:
I always wanted to do per-destination statistics. That would now be possible with syslog-ng 1.9.x, but that's not yet ready for production.
I mean, it would be possible to implement with the 1.9.x codebase, not that it is currently implemented. -- Bazsi
On Fri, 21 Oct 2005, Balazs Scheidler wrote:
On Fri, 2005-10-21 at 08:47 -0700, Scott C wrote:
At the end of the day, I guess my real question is, how truly accurate is the STATS number that I'm seeing?
Judging your configuration files seem to indicate that the tcp destination cannot keep up, your files should definitely be ok.
The number of DROPPED entries is STATS is a sum of all destinations, if you have three destinations then it is equal to three messages.
I always wanted to do per-destination statistics. That would now be possible with syslog-ng 1.9.x, but that's not yet ready for production.
It would be very nice to have per-destination RECEIVED stats as well in 1.9.x.
-- Bazsi
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
On Fri, Oct 21, 2005 at 12:34:12PM -0400, Mike wrote:
It would be very nice to have per-destination RECEIVED stats as well in 1.9.x.
Right, source statistics. BIND always had ways to dump statistics (8 was much more complete than 9). It would be nice to do the same thing and be able to send a signal to syslog-ng and have it dump the information on events sent, received and dropped, a summary for all sources and dests and also broken down for each one. More and more large scale syslog deployments are happening all the time and syslog-ng arguably has the most mindshare when people think of deploying syslog. Having advanced statistics in the 1.9/2.0 branch would steer people to the new branch sooner and get it better tested, IMHO. -- Nate "Whenever you find you are on the side of the majority, it is time to pause and reflect." - Samuel Clemens
On Fri, 21 Oct 2005, Nate Campi wrote:
On Fri, Oct 21, 2005 at 12:34:12PM -0400, Mike wrote:
It would be very nice to have per-destination RECEIVED stats as well in 1.9.x.
Right, source statistics. BIND always had ways to dump statistics (8 was much more complete than 9). It would be nice to do the same thing and be able to send a signal to syslog-ng and have it dump the information on events sent, received and dropped, a summary for all sources and dests and also broken down for each one.
I defintely like the idea of being able to kick syslog-ng and have stats drop out of, escpially if you can get soure and dest stats. but it is very nice to have those stats automatically written on a regular basis so you can pull that into a spreadsheet and generate some graphs based on that data. (yea you could script something to kick syslog-ng on a regular basis, but that is extra work ;)
More and more large scale syslog deployments are happening all the time and syslog-ng arguably has the most mindshare when people think of deploying syslog. Having advanced statistics in the 1.9/2.0 branch would steer people to the new branch sooner and get it better tested, IMHO. -- Nate
"Whenever you find you are on the side of the majority, it is time to pause and reflect." - Samuel Clemens
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
On Fri, Oct 21, 2005 at 01:54:45PM -0400, Mike wrote:
I defintely like the idea of being able to kick syslog-ng and have stats drop out of, escpially if you can get soure and dest stats. but it is very nice to have those stats automatically written on a regular basis so you can pull that into a spreadsheet and generate some graphs based on that data. (yea you could script something to kick syslog-ng on a regular basis, but that is extra work ;)
Agreed. I assumed that would stay, since that's the way it's already done. -- Nate There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. -C.A.R. Hoare
On Fri, 2005-10-21 at 10:34 -0700, Nate Campi wrote:
On Fri, Oct 21, 2005 at 12:34:12PM -0400, Mike wrote:
It would be very nice to have per-destination RECEIVED stats as well in 1.9.x.
More and more large scale syslog deployments are happening all the time and syslog-ng arguably has the most mindshare when people think of deploying syslog. Having advanced statistics in the 1.9/2.0 branch would steer people to the new branch sooner and get it better tested, IMHO.
I liked the idea, and thus went ahead and implemented it in the development branch. Syslog-ng generates a log message every once in a while as specified in stats_freq. The format of the log message is: <45>2005-10-22T23:38:14.567570+02:00 s_udp@bzorp syslog-ng[6871]: Log statistics; dropped='program(/bin/cat)=0', dropped='tcp(AF_INET(127.0.0.1:2001))=0' The patch has been commited to my private branch and the snapshot to be generated in half an hour will contain it :) -- Bazsi
On Sat, Oct 22, 2005 at 11:39:16PM +0200, Balazs Scheidler wrote:
I liked the idea, and thus went ahead and implemented it in the development branch. Syslog-ng generates a log message every once in a while as specified in stats_freq. The format of the log message is:
<45>2005-10-22T23:38:14.567570+02:00 s_udp@bzorp syslog-ng[6871]: Log statistics; dropped='program(/bin/cat)=0', dropped='tcp(AF_INET(127.0.0.1:2001))=0'
The patch has been commited to my private branch and the snapshot to be generated in half an hour will contain it :)
I can't get 1.9.6 to build on my Linux boxen: checking for pkg-config... /usr/bin/pkg-config checking pkg-config is at least version 0.9.0... yes checking for GLIB_CFLAGS... -I/usr/include/glib-1.2 -I/usr/lib/glib/include checking for GLIB_LIBS... -lglib checking for EVTLOG_CFLAGS... -I/usr/local/include/eventlog checking for EVTLOG_LIBS... -L/usr/local/lib -levtlog checking for static GLib libraries... yes configure: creating ./config.status config.status: creating dist.conf config.status: creating Makefile config.status: creating src/Makefile config.status: creating doc/Makefile config.status: creating doc/docvars.xml config.status: creating tests/Makefile config.status: creating tests/unit/Makefile config.status: creating tests/functional/Makefile config.status: creating config.h config.status: config.h is unchanged config.status: executing depfiles commands make all-recursive make[1]: Entering directory `/usr/src/syslog-ng-1.9.6' Making all in src make[2]: Entering directory `/usr/src/syslog-ng-1.9.6/src' if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I/usr/include/glib-1.2 -I/usr/lib/glib/include -I/usr/local/include/eventlog -D_GNU_SOURCE -g -O2 -Wall -g -MT misc.o -MD -MP -MF ".deps/misc.Tpo" \ -c -o misc.o `test -f 'misc.c' || echo './'`misc.c; \ then mv -f ".deps/misc.Tpo" ".deps/misc.Po"; \ else rm -f ".deps/misc.Tpo"; exit 1; \ fi In file included from misc.h:28, from misc.c:24: gsockaddr.h:51: error: syntax error before 'GIOStatus' gsockaddr.h:51: warning: no semicolon at end of struct or union gsockaddr.h:52: error: syntax error before '*' token gsockaddr.h:52: warning: type defaults to 'int' in declaration of 'GIOStatus' gsockaddr.h:52: error: 'GIOStatus' declared as function returning a function gsockaddr.h:52: warning: data definition has no type or storage class gsockaddr.h:57: error: syntax error before '}' token gsockaddr.h:77: error: syntax error before 'g_bind' gsockaddr.h:77: warning: type defaults to 'int' in declaration of 'g_bind' gsockaddr.h:77: warning: data definition has no type or storage class gsockaddr.h:78: error: syntax error before 'g_accept' gsockaddr.h:78: warning: type defaults to 'int' in declaration of 'g_accept' gsockaddr.h:78: warning: data definition has no type or storage class gsockaddr.h:79: error: syntax error before 'g_connect' gsockaddr.h:79: warning: type defaults to 'int' in declaration of 'g_connect' gsockaddr.h:79: warning: data definition has no type or storage class misc.c: In function 'g_string_assign_len': misc.c:45: warning: implicit declaration of function 'g_string_append_len' make[2]: *** [misc.o] Error 1 make[2]: Leaving directory `/usr/src/syslog-ng-1.9.6/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/usr/src/syslog-ng-1.9.6' make: *** [all] Error 2 My main box is running Debian Sid so it has gcc4, but I even tried it on a Debian Woody box with gcc 2.9.5 and the same thing happened. What is going wrong here? Thanks. -- Nate During the million-dollar BIND 9 rewrite, Paul Vixie characterized the original BIND code as 'sleazeware produced in a drunken fury by a bunch of U C Berkeley grad students.' -- D.J. Bernstein http://cr.yp.to/djbdns/blurb/unbind.html
On Sun, Oct 23, 2005 at 07:56:55PM -0700, Nate Campi wrote:
I can't get 1.9.6 to build on my Linux boxen:
<snip>
My main box is running Debian Sid so it has gcc4, but I even tried it on a Debian Woody box with gcc 2.9.5 and the same thing happened. What is going wrong here?
My mistake, I had the older glib dev package installed and no glib2 dev package. All better now... OBTW, syslog-ng.h is missing from last night's snapshot. -- Nate "General Failure's Fault. Not Yours." -Anon. "Hit any user to continue." -Anon. "Scandisk is now checking your hard disk. You can start praying." -Anon. "Smash forehead on keyboard to continue." -Anon.
On Sun, 2005-10-23 at 21:31 -0700, Nate Campi wrote:
On Sun, Oct 23, 2005 at 07:56:55PM -0700, Nate Campi wrote:
I can't get 1.9.6 to build on my Linux boxen:
<snip>
My main box is running Debian Sid so it has gcc4, but I even tried it on a Debian Woody box with gcc 2.9.5 and the same thing happened. What is going wrong here?
My mistake, I had the older glib dev package installed and no glib2 dev package. All better now...
OBTW, syslog-ng.h is missing from last night's snapshot.
thanks. commited a fix. -- Bazsi
On Fri, 2005-10-21 at 08:01 -0700, Nate Campi wrote:
On Fri, Oct 21, 2005 at 06:54:04AM -0700, Scott C wrote:
But what's really most peculiar in this scenario is the fact that the numbers simply don't add up. Why does syslog-ng appear (on the surface) to be dropping a very large percentage of the messages that it receives? I realize that it's not, but the numbers tell a different story. And how could it possibly drop so many messages when the FIFO queue is configured to buffer three million lines? How preposterous!
So you think you really have all the logs but you see STATS messages reporting dropped messages? How would you know if you really have them all? It's possible that under heavy load you have some program or pipe destination (or maybe even file if you have slow disks) that just can't keep up. That's not syslog-ng's fault, it just lets you know that the buffer filled up.
One minor note: files are _not_ flow-controlled. If a file destination cannot keep up, then the incoming side of syslog-ng (e.g. UDP receive buffers) will drop messages. It might be possible that syslog-ng can write all incoming messages to a file for instance, but fail to do so for a pipe or tcp destination. The reason is that pipe and tcp are flow controlled, they will not sink all messages that syslog-ng would send to them. -- Bazsi
participants (4)
-
Balazs Scheidler
-
Mike
-
Nate Campi
-
Scott C