[syslog-ng] Performance tuning questions

Tue Aug 22 15:04:38 CEST 2006

I am currently using the stock syslog daemon from RedHat but it appears
to not be able to keep up so I am looking at syslog-ng to improve
things. The data below is to provide a baseline of what I am currently
seeing and what I have attempted to do. Then if anyone would let me know
if syslog-ng would be able to improve the performance and what measures
I can take to achieve the improved performance that would be great.

Logs have to be rotated each hour due to the amount of traffic. On
average I am successfully logging 25,888 events per minute. That goes
higher during the early morning login times.

I have set the following sysctl params:

net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.ipv4.tcp_mem = 33554432 33554432 33554432

I have also set the ring params on the intel1000 nic (using intels
latest driver e1000-7.1.9, not the default kernel one) to 512MB.

Pre-set maximums:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             4096
Current hardware settings:
RX:             512
RX Mini:        0
RX Jumbo:       0
TX:             512

Based on the nic statistics, the bottleneck is not at the nic:

NIC statistics:
     rx_packets: 148314357
     tx_packets: 24906469
     rx_bytes: 3023662070
     tx_bytes: 3216438764
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 8531
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 10699059
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 76038106102
     rx_csum_offload_good: 148139283
     rx_csum_offload_errors: 0
     rx_header_split: 0
     alloc_rx_buff_failed: 0

Sar -B output shows a lot of paging going on which is probably causing
some loss

06:10:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s
06:20:01 AM      0.00    160.46      3.02      0.00
06:30:01 AM      0.00    162.02      3.04      0.00
06:40:01 AM      0.00    164.49      4.14      0.00
06:50:01 AM      0.02    161.51      3.03      0.00
07:00:01 AM      0.00    174.21      3.04      0.00
07:10:01 AM      0.11    198.15      6.02      0.00
07:20:01 AM      0.01    193.53      3.03      0.00
07:30:01 AM   1278.66   1593.51     29.30      0.04
Average:        67.76    189.06     19.12      0.02

Strange though (or maybe I am misreading this) sar -W shows all zeroe's
for pswpin/s and pswpout/s which says there is not swapping going on. 

Sar -N shows no errors either, but the udpsck value is locked at 5. I am
thinking this is hardcoded somewhere or maybe compiled in syslog srpms
as I cannot find a syslog setting. This is one reason for wanting to use
syslog-ng as from what I read I can change the number of allocated
sockets via the configuration file.

Sar -P shows the CPU is avg 95% idle so I see no issue here.

I have re-niced syslog to -10 to increase it's priority.

Netstat -su shows what might be data loss:
Udp:
    131725715 packets received
    16642 packets to unknown port received.
    4859684 packet receive errors
    31571 packets sent

I have no way to tell if the udp errors are related to syslog data and
with so much syslog data arriving setting up tcpdump to see what the
errors relate to is going to be rather difficult.

What I think is going on is the stock syslog daemon is simply unable to
buffer enough to keep up with the syslog stream. Therefore I am wanting
to look at using syslog-ng to see if the error rate drops. Also I was
thinking of using TCP for this but at the data rate I am seeing I am
thinking this would cause a potential denial of service on both the
syslog transmitters and the syslog receiving server.

Any thoughts, ideas?

Thanks

Greg