[syslog-ng] syslog-ng dropping events - update

3 Feb 2010

      I did a lot of research last week on network tuning for Centos 5.3, the OS on 
which syslog-ng is showing missed events.  I also have a RH Enterprise 3 system 
that doesn't show event loss.  The RHE3 system is forwarding events to the 
Centos-based system that is dropping events.  I compare the logs from the RHE3 
system with the logs on the Centos to determine if loss is occurring.

Here's what I did:

==== Mods to syslog-ng.conf ====
Enable dns cache options in syslog-ng.conf:
           use_dns (yes);
           dns_cache (yes);
           dns_cache_size (2000);
           dns_cache_expire (86400);

Set so_rcvbuf() option in syslog-ng.conf:
  udp(so_rcvbuf(1024000));

The resulting config:

@version: 3.0
#Default configuration file for syslog-ng.
#
# For a description of syslog-ng configuration file directives, please read
# the syslog-ng Administrator's guide at:
#
# http://www.balabit.com/dl/html/syslog-ng-admin-guide_en.html/bk01-toc.html
#

options {
           time_reopen (10);
           log_fifo_size (100000);
           long_hostnames (off);
           use_dns (yes);
           dns_cache (yes);
           dns_cache_size (2000);
           dns_cache_expire (86400);
           use_fqdn (no);
           create_dirs (yes);
           keep_hostname (yes);
         };

######
# sources
source s_local {
   # message generated by Syslog-NG
   internal();
   # standard Linux log source (this is the default place for the syslog()
   # function to send logs to)
   unix-stream("/dev/log");
   # messages from the kernel
   file("/proc/kmsg" program_override("kernel: "));
};

# Set the UDP receive buffer to 1M.
source s_net {
  udp(so_rcvbuf(1024000));
  tcp();
  #syslog();
};

######
# destinations
destination d_messages { file("/var/log/messages"); };

######
# Log local and network events
log { source(s_local); source(s_net); destination(d_messages); };

==== Mods to sysctl.conf ====
Modify kernel buffer parameters in /etc/sysctl.conf:
# Increase the maximum total buffer-space allocatable
net.ipv4.tcp_mem = 8388608 12582912 16777216
net.ipv4.udp_mem = 8388608 12582912 16777216

# Increase the maximum read-buffer space allocatable
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.udp_rmem_min = 16384

# Increase the maximum write-buffer-space allocatable
net.ipv4.tcp_wmem = 8192 65536 16777216
net.ipv4.udp_wmem_min = 16384

# Increase the maximum and default receive socket buffer size
net.core.rmem_max=16777216
net.core.rmem_default=87380

#Set maximum number of packets, queued on the INPUT side, when the interface 
receives packets faster than kernel can process them.  Default was 1000.
net.core.netdev_max_backlog = 2500

# Increase the max number of connections.  Default was 128.
# I think this is only for tcp and one web site recommended increasing
# it for heavily used web servers.
net.core.somaxconn=1024

# Default to 20480
net.core.optmem_max = 1024000

# Drop packets that look like they are spoofed.  Default = 1.
# 0 = don't check;
# 1 = drop packets that as sourced at a directly connected interface
#     but were input from another interface;
# 2 = drop any packets that look spoofed.
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.eth0.rp_filter = 0

Then restart sysctl:
sysctl -p

==== How to check packet loss on Linux ====
[Look for the section titled UDP in netstat output.  I like to preceed the
netstat command with a date command and save both to a file as a record for
later comparisons.]

# date
Wed Feb  3 16:33:55 EST 2010
# netstat -us
...
Udp:
     24264777 packets received
     2831 packets to unknown port received.
     14262 packet receive errors
     24538489 packets sent
...

# date
Wed Jan 27 11:21:54 EST 2010
# netstat -us
...
Udp:
     24227388 packets received
     470 packets to unknown port received.
     10644 packet receive errors
     24503312 packets sent
...

Note the increase in UDP receive errors.  The packet rate is pretty low on these 
systems, peaking at a few packets per second and averages well below a packet 
per second.

====== Other thoughts =====
The RHE3 system has been forwarding log messages using the address spoofing 
feature.  I turned it off yesterday afternoon and the logs show no packet loss 
since then.  Unfortunately, I did not do a netstat at that time, so I can't tell 
if the kernel knows of any loss.  I also did not have  eth0/rp_filter = 0 until 
today, figuring that the default/rp_filter = 0 parameter would cover it, but 
decided that it wouldn't hurt to have both set.  Interestingly, the RHE3 system 
has both values set to 1 and it does not drop any events (that I can detect - it 
certainly is getting more events than the Centos system).

I'm about to load RHE5 on a different server, forward events to it and see if it 
drops events.

	-tcs

-- 
Terry Slattery    CCIE# 1026