dropped udp packets and help with config
Something really wrong with syslog-ng or my config. I'm dropping way too many packets. I will admit that my configuration is probably really a large part of the problem and would appreciate it if someone could take a look at it and offer some suggestions. There is another thread going about a similar problem on a similar platform. We recently upgraded to Solaris 10 from Solaris 9 and I don't recall us dropping that many packets before. And we also upgraded from a very older Sylog-ng version to 3.1.2. I am basing the dropped packets on the udp stats, not syslog-ng stats. Syslog-ng stats has NO dropped packets. UDP udpInDatagrams -4599313 udpInErrors - 0 udpOutDatagrams - 3421 udpOutErrors - 0 tcpInErrs - 0 udpNoPorts -2587612 udpInCksumErrs - 0 udpInOverflows -95806254 The above is a 3 hour sample and it is from our syslog server that does not get that much traffic. ____________________________ Here is the current version info: Solaris 10, syslog-ng 3.1.2 Installer-Version: 3.1.2 Revision: ssh+git://bazsi@git.balabit //var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.1#master#8bf13c304b6ab5fc1a372b49d55c78370efe14ca Compile-Date: Oct 25 2010 23:56:18 Enable-Threads: off Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-Sun-STREAMS: on Enable-Sun-Door: on Enable-IPv6: on Enable-Spoof-Source: on Enable-TCP-Wrapper: off Enable-SSL: on Enable-SQL: off Enable-Linux-Caps: off Enable-Pcre: on _____________________________ Below is a very small sampling of our syslog-ng.conf. We are filtering on about 1400 devices most of which are Firewalls and routers. The IPs in the following sample have been made up. One of my questions is "Does the number of devices we are filtering on make a difference? (1400)" We have several sites and use just one version of the syslogng.conf file. It is a lot easier to maintain one copy:)) Also notice the format: ("^10\.123\.10\.133$") for the filters. All 1400 are in that format. I was hoping this would help a little but don't really know for sure:)) The source statementbelow "...external_Future_tcp" has not yet been implemented. Since we are dropping so many packets, I was going to try configuring the devices to log TCP instead of UDP. @version: 3.0 # Created: 01 March 2011 #----------------[ GLOBAL OPTIONS ]------------------------- options { create_dirs(yes); use_dns(no); time_reopen(10); time_reap(360); keep_timestamp(yes); }; #---------------------[ SOURCES ]--------------------------- source s_local { sun-stream("/dev/log" door("/etc/.syslog_door")); internal(); }; source s_external { udp(); }; source s_external_tcp { tcp(max-connections(50) port(514)); }; source s_external_Future_tcp { tcp(max-connections(1400) port(1470)); }; #---------------------[ DESTINATION ]--------------------------- destination d_local { file("/var/adm/messages" perm(0655) dir_perm(0655)); }; destination d_network_file { file("/logs/$YEAR/$MONTH/$DAY/network.log" perm(0655) dir_perm(0655)); }; destination d_bacsit { udp("10.11.13.114" port(2514) spoof-source(yes)); }; destination d_network_syslogd { udp("10.11.13.116" port(1514) spoof-source(yes)); }; destination d_firewall_file { file("/logs/$YEAR/$MONTH/$DAY/firewall/$HOST.log" perm(0655) dir_perm(0655)); }; destination d_mrv_file { file("/logs/$YEAR/$MONTH/$DAY/mrv.log" perm(0655) dir_perm(0655)); }; destination d_mail_file { file("/logs/$YEAR/$MONTH/$DAY/mail/$HOST.log" perm(0655) dir_perm(0655)); }; destination d_f567_file { file("/logs/$YEAR/$MONTH/$DAY/f5s/$HOST.log" perm(0655) dir_perm(0655)); }; #---------------------[ FILTERS ]--------------------------- filter f_f567 { host("^10\.123\.10\.133$") or # Host B host("^10\.100\.10\.200$") or # Host A host("^10\.115\.10\.246$") or # Host C host("^10\.121\.10\.102$") or # Host D host("^10\.117\.10\.99$"); # Host F }; filter f_mrv { host("^10\.68\.69\.100$") or # host("^10\.100\166\.10$") or # }; . . . and so on #---------------------[ LOGS ]--------------------------- log { source(s_local); destination(d_local); }; log { source(s_external); filter(f_f567); destination(d_f5_file); }; log { source(s_external); source(s_external_tcp); filter(f_firewall); destination(d_bacsit); }; log { source(s_external); filter(f_network); destination(d_bacsit); }; log { source(s_external); source(s_external_tcp); filter(f_firewall); destination(d_combo_file); }; log { source(s_external); filter(f_mail); destination(d_mail_file); }; ....and so on I'm grateful for all help and suggestions. Thanks!!
Hello, On Fri, May 13, 2011 at 7:43 PM, Zeek Anow <zeekstern@gmail.com> wrote:
Something really wrong with syslog-ng or my config. I'm dropping way too many packets. I will admit that my configuration is probably really a large part of the problem and would appreciate it if someone could take a look at it and offer some suggestions. There is another thread going about a similar problem on a similar platform.
We recently upgraded to Solaris 10 from Solaris 9 and I don't recall us dropping that many packets before. And we also upgraded from a very older Sylog-ng version to 3.1.2. I am basing the dropped packets on the udp stats, not syslog-ng stats. Syslog-ng stats has NO dropped packets.
This implies that syslog-ng couldn't read the messages from the UDP socket so the kernel dropped incoming logs to the floor.
UDP udpInDatagrams -4599313 udpInErrors - 0 udpOutDatagrams - 3421 udpOutErrors - 0 tcpInErrs - 0 udpNoPorts -2587612 udpInCksumErrs - 0 udpInOverflows -95806254
...
Below is a very small sampling of our syslog-ng.conf. We are filtering on about 1400 devices most of which are Firewalls and routers. The IPs in the following sample have been made up.
1400 *distinct* log sources like TCP streams? A single-threaded syslog-ng instance might not scale for such amount of log sources. An option would be running multiple syslog-ng instances listening on different ports and configure clients to use different ports sharing the load between the syslog-ng instances.
One of my questions is "Does the number of devices we are filtering on make a difference? (1400)" We have several sites and use just one version of the syslogng.conf file. It is a lot easier to maintain one copy:))
Also notice the format: ("^10\.123\.10\.133$") for the filters. All 1400 are in that format. I was hoping this would help a little but don't really know for sure:))
Could you make your filters hierarchical? Using the netmask() filter and nested log statements you could reduce the number of filter evaluations. For example filter f_tenten { netmask("10.10/16";) }; filter tentenone { netmask("10.10.1/24"); }; filter tententwo { netmask("10.10.2/24"); }; filter f_teneleven { netmask("10.11/16";) }; ... and later log { # first big network source(...); filter(f_tenten); log { filter(f_tentenone); destination(...); }; log { filter(tententwo); destination(...); }; flags(final); }; log { # second big network source(...); filter(f_teneleven); ... }; and so on, organizing filters into tree / forest hierarchy instead of evaluating all filters one by one for all messages. This way logs coming from 10.10/16 will get processed in the first log block. Logs originating from other networks will trigger the evaluation of the f_tenten filter and as it will give a false result none of the embedded log statements (including their filters) will process the log. So the logs will get processed in the next log section (with the second big network comment). Using flags(final) is very important here, it tells syslog-ng to do not process more log{} sections for the given log message, otherwise all further log sections will process the message. If you want to log a message to multiple places then just add the additional destinations to the proper log{} block. Regards, Sandor
Thanks for the reply Sandor. Much appreciated!1 I moved your comments up here to make it a little easier. Syslog-ng not being able to read the messages from UDP socket happens very fast. Probably within a minute or less of turning syslog-ng on. I have the only 2 Solaris parameters that I can set, (recv_high_water and udp_max_buf) set to the highest limit possible. I was thinking that something on the syslog-ng side might help. log_fetch_limit, log_iw_size log_fifo, flush_lines etc might help, but I don't have a clue of where to start or what to set them to. I could spend months pulling numbers out of the air and plugging them in to see what happens:)) With regards to your "1400 *distinct* log sources like TCP streams?". Up until a few weeks ago, we did have about 8 devices that were distinct log sources like TCP streams. This accounted for about 60% of our traffic. What we found was that a few devices were about 6 days behind in the log files. We changed their method of logging so it does not use syslog-ng anymore. I really like your idea of making the filters hierarchical and appreciate your example. That makes sense to me. At one time, we did do something similar to that but used a lot of regex to accomplish it. That is when we changed and went the ("^10\.123\.10\.133$") route for each IP address, thinking it would be more efficient. I will check into this asap!! Regards, Zeek On Fri, May 13, 2011 at 2:44 PM, Sandor Geller < Sandor.Geller@morganstanley.com> wrote:
Hello,
On Fri, May 13, 2011 at 7:43 PM, Zeek Anow <zeekstern@gmail.com> wrote:
Something really wrong with syslog-ng or my config. I'm dropping way too many packets. I will admit that my configuration is probably really a large part of the problem and would appreciate it if someone could take a look at it and offer some suggestions. There is another thread going about a similar problem on a similar platform.
We recently upgraded to Solaris 10 from Solaris 9 and I don't recall us dropping that many packets before. And we also upgraded from a very older Sylog-ng
version
to 3.1.2. I am basing the dropped packets on the udp stats, not syslog-ng stats. Syslog-ng stats has NO dropped packets.
This implies that syslog-ng couldn't read the messages from the UDP socket so the kernel dropped incoming logs to the floor.
UDP udpInDatagrams -4599313 udpInErrors - 0 udpOutDatagrams - 3421 udpOutErrors - 0 tcpInErrs - 0 udpNoPorts -2587612 udpInCksumErrs - 0 udpInOverflows -95806254
...
Below is a very small sampling of our syslog-ng.conf. We are filtering on about 1400 devices most of which are Firewalls and routers. The IPs in the following sample have been made up.
1400 *distinct* log sources like TCP streams? A single-threaded syslog-ng instance might not scale for such amount of log sources. An option would be running multiple syslog-ng instances listening on different ports and configure clients to use different ports sharing the load between the syslog-ng instances.
One of my questions is "Does the number of devices we are filtering on make a difference? (1400)" We have several sites and use just one version of the syslogng.conf file. It is a lot easier to maintain one copy:))
Also notice the format: ("^10\.123\.10\.133$") for the filters. All 1400 are in that format. I was hoping this would help a little but don't really know for sure:))
Could you make your filters hierarchical? Using the netmask() filter and nested log statements you could reduce the number of filter evaluations. For example
filter f_tenten { netmask("10.10/16";) };
filter tentenone { netmask("10.10.1/24"); };
filter tententwo { netmask("10.10.2/24"); };
filter f_teneleven { netmask("10.11/16";) };
...
and later
log { # first big network source(...); filter(f_tenten); log { filter(f_tentenone); destination(...); }; log { filter(tententwo); destination(...); }; flags(final); };
log { # second big network source(...); filter(f_teneleven); ... };
and so on, organizing filters into tree / forest hierarchy instead of evaluating all filters one by one for all messages. This way logs coming from 10.10/16 will get processed in the first log block. Logs originating from other networks will trigger the evaluation of the f_tenten filter and as it will give a false result none of the embedded log statements (including their filters) will process the log. So the logs will get processed in the next log section (with the second big network comment).
Using flags(final) is very important here, it tells syslog-ng to do not process more log{} sections for the given log message, otherwise all further log sections will process the message. If you want to log a message to multiple places then just add the additional destinations to the proper log{} block.
Regards,
Sandor
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Fri, 2011-05-13 at 13:43 -0400, Zeek Anow wrote:
Something really wrong with syslog-ng or my config. I'm dropping way too many packets. I will admit that my configuration is probably really a large part of the problem and would appreciate it if someone could take a look at it and offer some suggestions. There is another thread going about a similar problem on a similar platform.
We recently upgraded to Solaris 10 from Solaris 9 and I don't recall us dropping that many packets before. And we also upgraded from a very older Sylog-ng version to 3.1.2. I am basing the dropped packets on the udp stats, not syslog-ng stats. Syslog-ng stats has NO dropped packets.
In another thread someone was complaining similar issues on Solaris. It seems that the way syslog-ng writes log files (each line an individual write system call), seems to have an enormous overhead on Solaris, much more than on Linux. syslog-ng OSE 3.3 now contains a change to batch writes using writev() which should really improve performance on Solaris, however I'm just releasing a beta now, so it may not be ready for prime time yet. It'd be nice to know the root cause for the bad performance on Solaris, but until now noone in the community nailed it completely. I'd appreciate if you could give a test-drive of syslog-ng OSE 3.3 if it really improves the situation. Alternatively, since the buffering change via the Premium Edition, it might be easier to try that first: evals are free, and there you have a binary package to start with. With the OSE, you need to compile it yourself, which may or may not be that easy, depending on your experience with compiling packages. Of course we're here to help you in case you'd want to start compiling yourself. -- Bazsi
We recently upgraded to Solaris 10 from Solaris 9 and I don't recall us dropping that many packets before. And we also upgraded from a very older Sylog-ng version to 3.1.2. I am basing the dropped packets on the udp stats, not syslog-ng stats. Syslog-ng stats has NO dropped packets.
In another thread someone was complaining similar issues on Solaris. It seems that the way syslog-ng writes log files (each line an individual write system call), seems to have an enormous overhead on Solaris, much more than on Linux.
Attached is a program that tries to measure the speed difference between write() and writev(). It first writes N messages using write(), one by one, then it writes the same N messages using writev(), IOV_MAX message at a time. On both Linux and Solaris (OpenIndiana, actually, but shouldn't be much different on real Solaris), I get similar results: about 4 seconds for the write()s to finish, and below 1 second for writev(). While the write()->writev() performance is noticable, the difference between Solaris and Linux seems to be so very tiny, that it is pretty much negligible. (Unless, of course, I screwed up the test program, which is entirely possible) -- |8]
Thanks Gergely and Bazsi!! My problem is that I don't have a test lab. Everything I try is in production. I finally am getting a lab set up but we are looking at several weeks out. I will try to test OSE 3.3 and see what happens as Bazsil suggested and let you know how it turns out. Gergely - A 4 second difference seems HUGE to me. I haven't looked at your test program yet, but certainly will and give it a shot too, if I can. Thanks for the help. I really appreciate it. On Mon, May 23, 2011 at 12:04 PM, Gergely Nagy <algernon@balabit.hu> wrote:
We recently upgraded to Solaris 10 from Solaris 9 and I don't recall us dropping that many packets before. And we also upgraded from a very older Sylog-ng version to 3.1.2. I am basing the dropped packets on the udp stats, not syslog-ng stats. Syslog-ng stats has NO dropped packets.
In another thread someone was complaining similar issues on Solaris. It seems that the way syslog-ng writes log files (each line an individual write system call), seems to have an enormous overhead on Solaris, much more than on Linux.
Attached is a program that tries to measure the speed difference between write() and writev(). It first writes N messages using write(), one by one, then it writes the same N messages using writev(), IOV_MAX message at a time.
On both Linux and Solaris (OpenIndiana, actually, but shouldn't be much different on real Solaris), I get similar results: about 4 seconds for the write()s to finish, and below 1 second for writev().
While the write()->writev() performance is noticable, the difference between Solaris and Linux seems to be so very tiny, that it is pretty much negligible.
(Unless, of course, I screwed up the test program, which is entirely possible)
-- |8]
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
participants (5)
-
Balazs Scheidler
-
Fred Connolly
-
Gergely Nagy
-
Sandor Geller
-
Zeek Anow