Connections not closing on syslog-ng 2.1.4 server
I have a busy syslog-ng server that is collecting a large volume of logs. This problem is causing me grief and I am wondering if it is related to having flow_control enabled on the host. Any help would be greatly appreciated. The connection limit on the inbound tcp connection is slowly being exhausted by multiple connections from each client slowly building until the server stops working. The source is configured as such: source s_remote { tcp(ip(0.0.0.0) port(5140) log_iw_size(204800) max-connections(2048)); }; The bulk of the logs are being sent to other syslog-ng servers running on the same host. e.g.: destination child1 { tcp("127.0.0.1" port (5148)); }; All of the log lines have the same setup: log { source (s_remote); filter(f_q1); destination(d_q1); flags(flow-control,final); }; This setup is working very well with throughput in excess of 35000 messages per second but the whole thing blows up every couple of days due to running out of connections on the source. Before it dies, for a couple of days, I see a number of these errors in the logs: Number of allowed concurrent connections exceeded; num='2048', max='2048' Usually, the actual number of connections is a couple of hundred above the 2048 number listed here. (netstat -an | fgrep -v ESTABLISHED | fgrep :5140). The actual server count hovers around 1300 of which there are ~1000 actively logging. Options in use are: create_dirs(yes); dir_perm (0755); dns_cache_expire(28800); dns_cache (yes); flush_lines(10); flush_timeout(2048); frac_digits(3); keep_hostname(no); log_fetch_limit(100); log_fifo_size (2048); log_iw_size(100); long_hostnames(off); perm(0644); stats_freq(300); time_reopen(10); time_sleep(10); ts_format("iso"); use_dns(yes); use_fqdn(no); use_time_recvd(no); Thank you! --Robert
Sent: Wed May 09 2012 20:45:58 GMT-0400 (EDT) From: Robert Nickel <sng@forevernickel.com> To: syslog-ng@lists.balabit.hu Subject: [syslog-ng] Connections not closing on syslog-ng 2.1.4 server
I have a busy syslog-ng server that is collecting a large volume of logs. This problem is causing me grief and I am wondering if it is related to having flow_control enabled on the host. Any help would be greatly appreciated.
The connection limit on the inbound tcp connection is slowly being exhausted by multiple connections from each client slowly building until the server stops working.
The source is configured as such:
source s_remote { tcp(ip(0.0.0.0) port(5140) log_iw_size(204800) max-connections(2048)); };
The bulk of the logs are being sent to other syslog-ng servers running on the same host. e.g.:
destination child1 { tcp("127.0.0.1" port (5148)); };
All of the log lines have the same setup:
log { source (s_remote); filter(f_q1); destination(d_q1); flags(flow-control,final); };
This setup is working very well with throughput in excess of 35000 messages per second but the whole thing blows up every couple of days due to running out of connections on the source.
Before it dies, for a couple of days, I see a number of these errors in the logs:
Number of allowed concurrent connections exceeded; num='2048', max='2048'
Usually, the actual number of connections is a couple of hundred above the 2048 number listed here. (netstat -an | fgrep -v ESTABLISHED | fgrep :5140).
The actual server count hovers around 1300 of which there are ~1000 actively logging.
Options in use are:
create_dirs(yes); dir_perm (0755); dns_cache_expire(28800); dns_cache (yes); flush_lines(10); flush_timeout(2048); frac_digits(3); keep_hostname(no); log_fetch_limit(100); log_fifo_size (2048); log_iw_size(100); long_hostnames(off); perm(0644); stats_freq(300); time_reopen(10); time_sleep(10); ts_format("iso"); use_dns(yes); use_fqdn(no); use_time_recvd(no);
Thank you! --Robert Firstly, I think your netstat command is incorrect. If youre trying to count the number of open connections, that command you gave does the opposite, it counts inactive connections :-) netstat -an | fgrep -v ESTABLISHED | fgrep :5140 should be netstat -an | fgrep ESTABLISHED | fgrep :5140
So with this, I'd check that you dont indeed have 2048 open connections from all your clients. -Patrick
On 2012.05.09 21:05:14 -0400, Patrick Hemmer wrote:
Sent: Wed May 09 2012 20:45:58 GMT-0400 (EDT) [...]
Usually, the actual number of connections is a couple of hundred above the 2048 number listed here. (netstat -an | fgrep -v ESTABLISHED | fgrep :5140). [...] Firstly, I think your netstat command is incorrect. If youre trying to count the number of open connections, that command you gave does the opposite, it counts inactive connections :-) netstat -an | fgrep -v ESTABLISHED | fgrep :5140 should be netstat -an | fgrep ESTABLISHED | fgrep :5140
Yeah. Oops. Good catch. That is not what I am using to gather the connections and I inserted a typo. Sorry. fwiw: the actual command that I'm using is this: netstat -an | \ awk '$4 ~ /:5140$/ && /ESTABLISHED/ { gsub(":.*$","",$5); print $5; }' | tee /var/tmp/connections.txt | wc -l The fgrep bit was a (failed) attempt to keep it simple. *sigh*
So with this, I'd check that you dont indeed have 2048 open connections from all your clients.
The number of connections was in excess of the maximum defined in the source. Thank you, --Robert
participants (2)
-
Patrick Hemmer
-
Robert Nickel