I did a lot of research last week on network tuning for Centos 5.3, the OS on which syslog-ng is showing missed events. I also have a RH Enterprise 3 system that doesn't show event loss. The RHE3 system is forwarding events to the Centos-based system that is dropping events. I compare the logs from the RHE3 system with the logs on the Centos to determine if loss is occurring. Here's what I did: ==== Mods to syslog-ng.conf ==== Enable dns cache options in syslog-ng.conf: use_dns (yes); dns_cache (yes); dns_cache_size (2000); dns_cache_expire (86400); Set so_rcvbuf() option in syslog-ng.conf: udp(so_rcvbuf(1024000)); The resulting config: @version: 3.0 #Default configuration file for syslog-ng. # # For a description of syslog-ng configuration file directives, please read # the syslog-ng Administrator's guide at: # # http://www.balabit.com/dl/html/syslog-ng-admin-guide_en.html/bk01-toc.html # options { time_reopen (10); log_fifo_size (100000); long_hostnames (off); use_dns (yes); dns_cache (yes); dns_cache_size (2000); dns_cache_expire (86400); use_fqdn (no); create_dirs (yes); keep_hostname (yes); }; ###### # sources source s_local { # message generated by Syslog-NG internal(); # standard Linux log source (this is the default place for the syslog() # function to send logs to) unix-stream("/dev/log"); # messages from the kernel file("/proc/kmsg" program_override("kernel: ")); }; # Set the UDP receive buffer to 1M. source s_net { udp(so_rcvbuf(1024000)); tcp(); #syslog(); }; ###### # destinations destination d_messages { file("/var/log/messages"); }; ###### # Log local and network events log { source(s_local); source(s_net); destination(d_messages); }; ==== Mods to sysctl.conf ==== Modify kernel buffer parameters in /etc/sysctl.conf: # Increase the maximum total buffer-space allocatable net.ipv4.tcp_mem = 8388608 12582912 16777216 net.ipv4.udp_mem = 8388608 12582912 16777216 # Increase the maximum read-buffer space allocatable net.ipv4.tcp_rmem = 8192 87380 16777216 net.ipv4.udp_rmem_min = 16384 # Increase the maximum write-buffer-space allocatable net.ipv4.tcp_wmem = 8192 65536 16777216 net.ipv4.udp_wmem_min = 16384 # Increase the maximum and default receive socket buffer size net.core.rmem_max=16777216 net.core.rmem_default=87380 #Set maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them. Default was 1000. net.core.netdev_max_backlog = 2500 # Increase the max number of connections. Default was 128. # I think this is only for tcp and one web site recommended increasing # it for heavily used web servers. net.core.somaxconn=1024 # Default to 20480 net.core.optmem_max = 1024000 # Drop packets that look like they are spoofed. Default = 1. # 0 = don't check; # 1 = drop packets that as sourced at a directly connected interface # but were input from another interface; # 2 = drop any packets that look spoofed. net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.eth0.rp_filter = 0 Then restart sysctl: sysctl -p ==== How to check packet loss on Linux ==== [Look for the section titled UDP in netstat output. I like to preceed the netstat command with a date command and save both to a file as a record for later comparisons.] # date Wed Feb 3 16:33:55 EST 2010 # netstat -us ... Udp: 24264777 packets received 2831 packets to unknown port received. 14262 packet receive errors 24538489 packets sent ... # date Wed Jan 27 11:21:54 EST 2010 # netstat -us ... Udp: 24227388 packets received 470 packets to unknown port received. 10644 packet receive errors 24503312 packets sent ... Note the increase in UDP receive errors. The packet rate is pretty low on these systems, peaking at a few packets per second and averages well below a packet per second. ====== Other thoughts ===== The RHE3 system has been forwarding log messages using the address spoofing feature. I turned it off yesterday afternoon and the logs show no packet loss since then. Unfortunately, I did not do a netstat at that time, so I can't tell if the kernel knows of any loss. I also did not have eth0/rp_filter = 0 until today, figuring that the default/rp_filter = 0 parameter would cover it, but decided that it wouldn't hurt to have both set. Interestingly, the RHE3 system has both values set to 1 and it does not drop any events (that I can detect - it certainly is getting more events than the Centos system). I'm about to load RHE5 on a different server, forward events to it and see if it drops events. -tcs -- Terry Slattery CCIE# 1026