Ok, maybe even easier than running tcpdump for DNS would be to just swap $FULLHOST_FROM with $SOURCEIP and see if that improves things.
That didn't seem to impact the rate at all.
Two more things to look at: what is the CPU % when it's running, and
Running with 2000 messages/second, the CPU is usually at least 98% idle. The CPU display from top looks approximately like the following throughout the duration of the test: Cpu(s): 1.1%us, 0.3%sy, 0.0%ni, 98.5%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st So there's not much waiting for i/o, either.
if you strace it what syscalls does it seem to be doing the most? Gettimeofday should be in there quite a bit, but sometimes calls you didn't expect jump out and show what's blocking.
Running 'strace -c' against the syslog-ng process yields: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 47.43 0.008792 0 42984 write 28.85 0.005347 2 2535 poll 12.56 0.002328 0 42984 lseek 11.16 0.002068 0 21975 484 recvfrom 0.00 0.000000 0 1 close 0.00 0.000000 0 11 stat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.018535 110490 484 total The errors are all from recvfrom() reporting EAGAIN. I don't think anything here is terribly surprising.
You really shouldn't have to post-process with syslog-ng; there's got to be something wrong. I also find your raw socket numbers to be lower than I'd expect.
Me too! :) Things are better (but still not perfect) with a 2GB buffer, but we're only throwing a 4Mbps stream of data at the system, and I would expect it to handle this without a problem. I remain puzzled.