UDP errors and lost UDP messages
Greetings list! Using syslog-ng 3.1 with Debian Squeeze, 2.6.32-5-amd64. The system has 8GB of RAM. I'm losing some UDP logs. I know to not use UDP - we use TLS for our Debian systems, but our Cisco gear leaves us with few options. According to netstat, the rate is anywhere from 600 to 3000 UDP errors per second. Using a tcpdump query of "dst port 514" show about the same rate of UDP traffic coming to the system. I've bumped the buffer size according to various docs: $ head -n -0 /proc/sys/net/core/[rw]mem_* ==> /proc/sys/net/core/rmem_default <== 16777216 ==> /proc/sys/net/core/rmem_max <== 16777216 ==> /proc/sys/net/core/wmem_default <== 16777216 ==> /proc/sys/net/core/wmem_max <== 16777216 And the udp specific memory limits: $ head -n -0 /proc/sys/net/ipv4/*udp* ==> /proc/sys/net/ipv4/udp_mem <== 768384 1024512 1536768 ==> /proc/sys/net/ipv4/udp_rmem_min <== 16777216 ==> /proc/sys/net/ipv4/udp_wmem_min <== 16777216 My UDP source for syslog-ng is also using a larger buffer: $ grep -A4 -B1 'udp(' /etc/syslog-ng/syslog-ng.conf source s_udp { udp( keep_hostname(yes) so_rcvbuf(16777216) ); }; According to syslog-ng-ctl stats the system is processing ~270 UDP messages per second. This hasn't really changed since I've made the kernel variable tweaks, nor after changing the so_rcvbuf size either. Any ideas of what to look for next? Thanks! -m
Hi, 270 is not a lot unless there's some kind of bottleneck in the syslog-ng side. DNS is often a culprit, that's why syslog-ng has a DNS cache which should address the problem. Do you have any kind of related settings in your configuration. Also, 3.1 is pretty old, can you perhaps upgrade that to something more recent? I think squeeze is supported by the madhouse.org packages. On Fri, Apr 3, 2015 at 11:53 PM, Matt Zagrabelny <mzagrabe@d.umn.edu> wrote:
Greetings list!
Using syslog-ng 3.1 with Debian Squeeze, 2.6.32-5-amd64. The system has 8GB of RAM.
I'm losing some UDP logs. I know to not use UDP - we use TLS for our Debian systems, but our Cisco gear leaves us with few options.
According to netstat, the rate is anywhere from 600 to 3000 UDP errors per second. Using a tcpdump query of "dst port 514" show about the same rate of UDP traffic coming to the system.
I've bumped the buffer size according to various docs: $ head -n -0 /proc/sys/net/core/[rw]mem_* ==> /proc/sys/net/core/rmem_default <== 16777216
==> /proc/sys/net/core/rmem_max <== 16777216
==> /proc/sys/net/core/wmem_default <== 16777216
==> /proc/sys/net/core/wmem_max <== 16777216
And the udp specific memory limits:
$ head -n -0 /proc/sys/net/ipv4/*udp* ==> /proc/sys/net/ipv4/udp_mem <== 768384 1024512 1536768
==> /proc/sys/net/ipv4/udp_rmem_min <== 16777216
==> /proc/sys/net/ipv4/udp_wmem_min <== 16777216
My UDP source for syslog-ng is also using a larger buffer:
$ grep -A4 -B1 'udp(' /etc/syslog-ng/syslog-ng.conf source s_udp { udp( keep_hostname(yes) so_rcvbuf(16777216) ); };
According to syslog-ng-ctl stats the system is processing ~270 UDP messages per second. This hasn't really changed since I've made the kernel variable tweaks, nor after changing the so_rcvbuf size either.
Any ideas of what to look for next?
Thanks!
-m
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
-- Bazsi
On Tue, Apr 7, 2015 at 3:36 AM, Balazs Scheidler <bazsi77@gmail.com> wrote:
Hi,
270 is not a lot unless there's some kind of bottleneck in the syslog-ng side. DNS is often a culprit, that's why syslog-ng has a DNS cache which should address the problem. Do you have any kind of related settings in your configuration.
Hi Bazsi! I do use DNS, but per recommendations I use the cache. Here is my complete config: @version: 3.1 options { long_hostnames(off); flush_lines(0); use_fqdn(no); owner("root"); group("adm"); perm(0640); stats_freq(0); bad_hostname("^gconfd$"); create_dirs(yes); dir_perm(0755); chain_hostnames(0); time_reopen(10); time_reap(360); time_sleep(20); use_dns(yes); dns_cache(2000); dns_cache_expire(87600); log_fetch_limit(10); log_fifo_size(200000); # 10 polls of (10 fetch limit * 2000 connections) log_iw_size(20000); # 10 fetch limit * 2000 connections (default 100) }; ######################## # Sources ######################## # This is the default behavior of sysklogd package # Logs may come from unix stream, but not from another machine. # source s_src { unix-dgram("/dev/log"); internal(); file("/proc/kmsg" program_override("kernel")); }; source s_tls { syslog( port(6514) transport("tls") tls( peer-verify(required-trusted) ca_dir('/etc/syslog-ng/ssl/ca.d') key_file('/etc/syslog-ng/ssl/server.key') cert_file('/etc/syslog-ng/ssl/server.crt') ) max_connections(2000) keep_hostname(yes) so_rcvbuf(16777216) ); }; source s_udp { udp( keep_hostname(yes) so_rcvbuf(16777216) ); }; ######################## # Destinations ######################## # The root's console. # destination d_console { usertty("root"); }; # Virtual console. # destination d_console_all { file("/dev/tty10"); }; destination df_filter_by_facility { file( "/var/log/$FACILITY.log" owner(root) group(root) perm(0644) dir_perm(0755) create_dirs(yes) ); }; destination d_remote_clients { file( "/var/log/syslog-ng/remote_clients/$HOST_FROM/$YEAR/$MONTH/$DAY/$FACILITY" owner(root) group(root) perm(0644) dir_perm(0755) create_dirs(yes) ); }; destination d_remote_clients_udp { file( "/var/log/syslog-ng/remote_clients/.udp/$HOST_FROM/$YEAR/$MONTH/$DAY/$FACILITY" owner(root) group(root) perm(0644) dir_perm(0755) create_dirs(yes) ); }; ######################## # Filters ######################## filter f_crit { level(crit .. emerg); }; filter f_console { level(warn .. emerg); }; ######################## # Log paths ######################## log { source(s_src); filter(f_console); destination(d_console_all); }; log { source(s_src); filter(f_crit); destination(d_console); }; log { source(s_src); destination(df_filter_by_facility); }; log { source(s_tls); source(s_udp); destination(d_remote_clients); flags(flow-control); };
Also, 3.1 is pretty old, can you perhaps upgrade that to something more recent? I think squeeze is supported by the madhouse.org packages.
Sure. I'll look at upgrading or standing up a newer Debian system with a more recent syslog-ng. Any other pointers in the mean time? -m
Version 3.1 is really old, but my recollection is that since UDP sources do not have a "connection" the fetch limit does not have the same meaning. With your time_sleep of 20 milliseconds, , and your fetch limit of 10, you could only process 1000ms/20ms * 10 = 500 UDP messages per second. You would need to remove your time_sleep option, or set your fetch limit much higher. If I recall correctly, we used a fetch limit of 5000 when running the 3.1 series of syslog-ng. Since you use a destination template based on the source host name, you could probably use a relatively small log_fifo_size because it is a per-destination setting. If you used a 5000 fetch limit, then a log_fifo_size of 500000 would probably be sufficient. Your log_is_size will be much more critical. I would have to read the manual again to know if this was a per source setting (be careful with connectionless UDP) or a global setting, With flow-control enabled, the source is stopped being read, which is fine for TCP sources, but UDP messages still arrive, and the OS UDP buffers will start dropping the messages. The higher releases of syslog-ng 3.5 and 3.6 have *huge* performance gains. If you must stay with 3.1 then it might be useful to run two instances. One for TCP sources configured similar to what I have described above, and one for UDP sources that do NOT use flow-control. After all, flow control will just make the OS drop the messages anyway. Hope that helps. Evan. On 04/07/2015 08:29 AM, Matt Zagrabelny wrote:
On Tue, Apr 7, 2015 at 3:36 AM, Balazs Scheidler <bazsi77@gmail.com> wrote:
Hi,
270 is not a lot unless there's some kind of bottleneck in the syslog-ng side. DNS is often a culprit, that's why syslog-ng has a DNS cache which should address the problem. Do you have any kind of related settings in your configuration. Hi Bazsi!
I do use DNS, but per recommendations I use the cache. Here is my complete config:
@version: 3.1
options { long_hostnames(off); flush_lines(0); use_fqdn(no); owner("root"); group("adm"); perm(0640); stats_freq(0); bad_hostname("^gconfd$");
create_dirs(yes); dir_perm(0755); chain_hostnames(0); time_reopen(10); time_reap(360);
time_sleep(20); use_dns(yes); dns_cache(2000); dns_cache_expire(87600);
log_fetch_limit(10); log_fifo_size(200000); # 10 polls of (10 fetch limit * 2000 connections) log_iw_size(20000); # 10 fetch limit * 2000 connections (default 100) };
######################## # Sources ######################## # This is the default behavior of sysklogd package # Logs may come from unix stream, but not from another machine. # source s_src { unix-dgram("/dev/log"); internal(); file("/proc/kmsg" program_override("kernel")); };
source s_tls { syslog( port(6514) transport("tls") tls( peer-verify(required-trusted) ca_dir('/etc/syslog-ng/ssl/ca.d') key_file('/etc/syslog-ng/ssl/server.key') cert_file('/etc/syslog-ng/ssl/server.crt') ) max_connections(2000) keep_hostname(yes) so_rcvbuf(16777216) ); };
source s_udp { udp( keep_hostname(yes) so_rcvbuf(16777216) ); };
######################## # Destinations ########################
# The root's console. # destination d_console { usertty("root"); };
# Virtual console. # destination d_console_all { file("/dev/tty10"); };
destination df_filter_by_facility { file( "/var/log/$FACILITY.log" owner(root) group(root) perm(0644) dir_perm(0755) create_dirs(yes) ); };
destination d_remote_clients { file( "/var/log/syslog-ng/remote_clients/$HOST_FROM/$YEAR/$MONTH/$DAY/$FACILITY" owner(root) group(root) perm(0644) dir_perm(0755) create_dirs(yes) ); };
destination d_remote_clients_udp { file( "/var/log/syslog-ng/remote_clients/.udp/$HOST_FROM/$YEAR/$MONTH/$DAY/$FACILITY" owner(root) group(root) perm(0644) dir_perm(0755) create_dirs(yes) ); };
######################## # Filters ########################
filter f_crit { level(crit .. emerg); }; filter f_console { level(warn .. emerg); };
######################## # Log paths ########################
log { source(s_src); filter(f_console); destination(d_console_all); }; log { source(s_src); filter(f_crit); destination(d_console); };
log { source(s_src); destination(df_filter_by_facility); };
log { source(s_tls); source(s_udp); destination(d_remote_clients); flags(flow-control); };
Also, 3.1 is pretty old, can you perhaps upgrade that to something more recent? I think squeeze is supported by the madhouse.org packages. Sure. I'll look at upgrading or standing up a newer Debian system with a more recent syslog-ng.
Any other pointers in the mean time?
-m
On Tue, Apr 7, 2015 at 10:48 AM, Evan Rempel <erempel@uvic.ca> wrote:
Version 3.1 is really old, but my recollection is that since UDP sources do not have a "connection" the fetch limit does not have the same meaning. With your time_sleep of 20 milliseconds, , and your fetch limit of 10, you could only process 1000ms/20ms * 10 = 500 UDP messages per second.
You would need to remove your time_sleep option, or set your fetch limit much higher.
If I recall correctly, we used a fetch limit of 5000 when running the 3.1 series of syslog-ng.
Since you use a destination template based on the source host name, you could probably use a relatively small log_fifo_size because it is a per-destination setting. If you used a 5000 fetch limit, then a log_fifo_size of 500000 would probably be sufficient.
Your log_is_size will be much more critical. I would have to read the manual again to know if this was a per source setting (be careful with connectionless UDP) or a global setting, With flow-control enabled, the source is stopped being read, which is fine for TCP sources, but UDP messages still arrive, and the OS UDP buffers will start dropping the messages.
The higher releases of syslog-ng 3.5 and 3.6 have *huge* performance gains. If you must stay with 3.1 then it might be useful to run two instances. One for TCP sources configured similar to what I have described above, and one for UDP sources that do NOT use flow-control. After all, flow control will just make the OS drop the messages anyway.
Hey Evan, Thanks for the time you took to write up the detailed message. I appreciate it. I was still hitting issues with the flow-control omitted. But I'll review the rest of your email and report back. Cheers! -m
participants (3)
-
Balazs Scheidler
-
Evan Rempel
-
Matt Zagrabelny