Hi, 

I am having syslog running all udp/514 on 8 machines and we receive 40000 EPS in a given point in time at each of the 8 locations.  We have virtual IP's in all the 8 boxes.

we have syslog version 3.5.6.

Questions

  • Besides a packet capture, is there anything in RHEL (built-in preferred) that provides a counter of UDP packets attempted and not just those that are received (see comments on netstat -su output below)?
  • Is it possible to receive all packets sent yet still get UDP errors?
  • What is a reasonable amount of syslog EPS to expect with this hardware?
  • Are there any other tweaks to the Kernel or Syslog-ng to make?
Environment:

  • RHEL 7.3
  • 16 cores
  • 64GB Memory
  • 5TB SAS 15k disk

Errors:

Udp:
    1540062710 packets received
    257 packets to unknown port received.
    306945112 packet receive errors
    1378189992 packets sent
    306945112 receive buffer errors
    101 send buffer errors

  • The "packets received" is how many were accepted by the listening application and NOT how many were attempted
  • The "packets to unknown port" is what came in for the application when the application was down or not listening (e.g. during a restart)
  • The "packet receive errors" clearly indicates an error; HOWEVER, it is not a one-for-one packets-to-error. You can have many more errors than packets were sent, indicating it is possible for a single packet to generate multiple errors.
Tuning Steps
These are similar to what was done; however, multiple values were tried so the numbers below are not exactly what is in production now:
We tried turning off the udp/514 forwarding to the other applications but we did not see a noticeable drop in errors
kernel
net.ipv4.udp_rmem_min = 131072
net.ipv4.udp_wmem_min = 131072
net.core.netdev_max_backlog=2000
net.core.rmem_max=67108864
syslog-ng
options {
        sync (5000);
        time_reopen (10);
        time_reap(5);
        long_hostnames (off);
        use_dns (no);
        use_fqdn (no);
        create_dirs (no);
        keep_hostname (yes);
        log_fifo_size (536870912);
        stats_freq(60);
        flush_lines(500);
        flush_timeout(10000);
};