We've been centrally logging with syslog-ng for about 5 years now. Over that time, the number of sources has grown significantly, and at some point we crossed a line where drops were happening (a quick survey of 3 million syslog packets yielded
420 unique currently sending hosts). After much research and experimentation, we've been able to get to the point where throughout the day there are 0 drops for the most part. This was achieved by installing the latest syslog-ng (not the RedHat packaged
one) and creating a source for each CPU. Occasionally, though, we still have periods of drops so I'm trying to eliminate these last few.
Here are some relevant configuration items:
2 RedHat 7.8 VMs (load balanced via an F5) with 16GB memory and 4 CPUs each running syslog-ng-3.24.1-1.el7.x86_64.
net.core.rmem_default = 212992
net.core.rmem_max = 268435456
log_fifo_size(268435456);
source s_network {
network(ip("0.0.0.0") port(514) transport("udp") so_rcvbuf(441326592) so-reuseport(1) persist-name("udp1"));
network(ip("0.0.0.0") port(514) transport("udp") so_rcvbuf(441326592) so-reuseport(1) persist-name("udp2"));
network(ip("0.0.0.0") port(514) transport("udp") so_rcvbuf(441326592) so-reuseport(1) persist-name("udp3"));
network(ip("0.0.0.0") port(514) transport("udp") so_rcvbuf(441326592) so-reuseport(1) persist-name("udp4"));
network(ip("0.0.0.0") port(514) transport("tcp") max_connections(200) keep_alive(yes) so_rcvbuf(67108864));
};
We are limited to UDP, unfortunately, because we do not have control over the devices/networks/etc. that are sending to us, but we have changed as many of the internal senders and destinations to TCP as we can.
With a script I created to view the packets, including drops, as well as the individual RECVQs, the issue can be illustrated.
Here's what things look like normally:
Thu May 7 10:48:15 EDT 2020, 27003 IP pkts rcvd,26980 IP pkts sent,24951 UDP pkts rcvd, 28075 UDP pkts sent,0 UDP pkt rcv err
RECVQ-1=2176
RECVQ-2=0
RECVQ-3=0
RECVQ-4=0
Thu May 7 10:48:16 EDT 2020, 28453 IP pkts rcvd,28426 IP pkts sent,26185 UDP pkts rcvd, 29180 UDP pkts sent,0 UDP pkt rcv err
RECVQ-1=0
RECVQ-2=0
RECVQ-3=4352
RECVQ-4=0
Thu May 7 10:48:17 EDT 2020, 28294 IP pkts rcvd,28276 IP pkts sent,26277 UDP pkts rcvd, 28709 UDP pkts sent,0 UDP pkt rcv err
RECVQ-1=2176
RECVQ-2=0
RECVQ-3=0
RECVQ-4=0
The RECVQs are sparsely used, and there are no errors.
Around 9pm every night, the packet counts go up significantly (probably due to backup related logs):
Wed May 6 21:00:08 EDT 2020, 66382 IP pkts rcvd,66366 IP pkts sent,39405 UDP pkts rcvd, 67592 UDP pkts sent,0 UDP pkt rcv err
RECVQ-1=1595008
RECVQ-2=106217088
RECVQ-3=53694976
RECVQ-4=31858816
Wed May 6 21:00:09 EDT 2020, 69317 IP pkts rcvd,69338 IP pkts sent,44446 UDP pkts rcvd, 75958 UDP pkts sent,0 UDP pkt rcv err
RECVQ-1=13056
RECVQ-2=126397312
RECVQ-3=75568128
RECVQ-4=41626880
Wed May 6 21:00:10 EDT 2020, 71205 IP pkts rcvd,71227 IP pkts sent,43657 UDP pkts rcvd, 74603 UDP pkts sent,0 UDP pkt rcv err
RECVQ-1=920448
RECVQ-2=146122752
RECVQ-3=100951168
RECVQ-4=52622208
Wed May 6 21:00:12 EDT 2020, 69578 IP pkts rcvd,69454 IP pkts sent,124465 UDP pkts rcvd, 163367 UDP pkts sent,0 UDP pkt rcv err
RECVQ-1=13140864
RECVQ-2=44494848
RECVQ-3=125579136
RECVQ-4=0
Still, though, it's handling it with no errors. But then at some point a threshold is reached and errors start piling up:
Wed May 6 21:00:20 EDT 2020, 63177 IP pkts rcvd,63291 IP pkts sent,0 UDP pkts rcvd, 0 UDP pkts sent,38011 UDP pkt rcv err
RECVQ-1=536871424
RECVQ-2=200357376
RECVQ-3=292948352
RECVQ-4=28890752
Wed May 6 21:00:21 EDT 2020, 69501 IP pkts rcvd,69464 IP pkts sent,0 UDP pkts rcvd, 1 UDP pkts sent,42158 UDP pkt rcv err
RECVQ-1=536871424
RECVQ-2=223551360
RECVQ-3=314995584
RECVQ-4=41735680
Wed May 6 21:00:23 EDT 2020, 69962 IP pkts rcvd,69978 IP pkts sent,0 UDP pkts rcvd, 2 UDP pkts sent,43775 UDP pkt rcv err
RECVQ-1=536871424
RECVQ-2=244732544
RECVQ-3=338239616
RECVQ-4=53858176
Wed May 6 21:00:24 EDT 2020, 68266 IP pkts rcvd,68216 IP pkts sent,0 UDP pkts rcvd, 0 UDP pkts sent,43118 UDP pkt rcv err
RECVQ-1=536871424
RECVQ-2=265258752
RECVQ-3=360643712
RECVQ-4=65362688