This was related to dns resolution. This particular logging service has clients that are not on our own network and as such the DNS resolution is "far away" as far as networks are concerned. By turning off dns resolution in syslog-ng the queuing became nearly 0. No longer was the queue contents cycling from 0 through to 200,000 messages and then flushing. Then I added nscd using hostname caching only, and enable the syslog-ng dns resolution again. The great performance remained with a near zero queue size. Now I have the same functionality with dns names in my log files and I do not have any slow downs for name resolution inside of syslog-ng. Only time will tell if this is related to the allowed connections issue but they seem to be strongly linked at this time. Evan. On 2022-02-18 09:45, Evan Rempel wrote:
Some more anecdotal details. I know this is not really a systematic approach to tyrouble shooting this. I guess I'll have to turn on the verbose stats. We collect them regularly so that might get overwhelmed.
The issue started again.
Number of allowed concurrent connections reached, rejecting connection; client='AF_INET(XXXX:50773)', local='AF_INET(YYYY:6514)', group_name='client_network_tcp', location='/etc/syslog-ng/syslog-ng.server.conf:61:9', max='15000'
The established connection count was 2747
$ netstat -an | grep ESTABLISHED | grep -c 6514 2747
The output destination stats were
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945402 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;memory_usage;148574728 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;written;1839783 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;truncated_count;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;truncated_bytes;0
repeating this statistics report a few times with 2-3 seconds between then it shows
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945406 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945406 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945410 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945410 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1960812 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;121005
reload syslong-ng
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2084011 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;97926
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2084013 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;97938
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2103861 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;19842
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2107477 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;23463
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2112825 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;28810
By the time I was this far composingthis message, things seemed to have stalled again
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2909715 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;54738
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2909718 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;54738
eventually it started cycling through the large queueing and flushing
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3177968 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;157790
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3187677 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;2150
with periods of getting stuck
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3418242 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;62294
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3418246 dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;62294
and judging by the cpu usage, not all of the log messages are being processed.
Verbose stats ... here I come.
On 2022-02-18 04:08, Laszlo Varady (lvarady) wrote:
Hi,
A macroed file destination is unlikely to cause such an issue if the location is writable.
Just a tip that might help ruling out the case I mentioned: Setting the stats-level() to 4 results in an extremely verbose stats output, where a counter called "free_window" can be found for each network connection. This can be used to check whether a connection is suspended or not. Note that this is a momentary value, which oscillates between 0 and full_window, so a momentary 0 does not mean anything bad, but we're looking for fixed 0 "free_window" values.
-- László Várady ------------------------------------------------------------------------ *From:* syslog-ng <syslog-ng-bounces@lists.balabit.hu> on behalf of Evan Rempel <erempel@uvic.ca> *Sent:* Friday, February 18, 2022 3:22 *To:* syslog-ng@lists.balabit.hu <syslog-ng@lists.balabit.hu> *Subject:* Re: [syslog-ng] allowed concurrent connections - bug? CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.
There is only 1 destination, although it is a file named with macros of date and hour.
destination workstation.log { file("/var/syslog/workstation.log.$R_YEAR$R_MONTH$R_DAY.${R_HOUR}0000" ); };
And that volume has never become full.
Also, with a file based destination I can't actually turn flow-control off since files have soft flow-control.
Because I have 3500 real connections that are all active (total of 10,000 messages per second) syslog-ng cycles through reading 100 messages from each source for 35000 messages and then writing those to disk. it is very common for the queued messages to fluctuate from a few hundred to 200,000 messages.
I may have to add some new metrics to our statistics gathering to understand more about what is happening.
Evan Rempel.
On 2022-02-17 13:13, Laszlo Varady (lvarady) wrote:
Hi,
Do you have flags(flow-control) specified in your log paths? If so, a dead destination in such log paths might cause the mentioned issue.
When flow-control is activated, the corresponding sources will be suspended. This suspended state does not even allow syslog-ng to truly release connections that have been closed by the clients. This is actually more of expected behavior as we don't want to allow new connections in situations where logs could not be delivered anyway.
Please check the queued statistic counters of "syslog-ng-ctl stats" to see whether this is the case.
In case of anything else, I would suspect a bug.
-- László Várady ------------------------------------------------------------------------ *From:* syslog-ng <syslog-ng-bounces@lists.balabit.hu> <mailto:syslog-ng-bounces@lists.balabit.hu> on behalf of Evan Rempel <erempel@uvic.ca> <mailto:erempel@uvic.ca> *Sent:* Thursday, February 17, 2022 19:01 *To:* syslog-ng@lists.balabit.hu <mailto:syslog-ng@lists.balabit.hu> <syslog-ng@lists.balabit.hu> <mailto:syslog-ng@lists.balabit.hu> *Subject:* [syslog-ng] allowed concurrent connections - bug?
I am having an issue that is a little difficult to reproduce so I wanted some input from others.
I have a syslog-ng 3.35.1 that has a TLS source defined with max-connections(10000)
After some time the server starts logging a lot of messages
syslog-ng[12802]: Number of allowed concurrent connections reached, rejecting connection; client='AF_INET(XXXX:61062)', local='AF_INET(YYYY:6514)', group_name='client_network_tcp', location='/etc/syslog-ng/syslog-ng.server.conf:61:9', max='10000'
To the best of my ability I can only find about 2500 actual connections.
Both lsof and netstat report around the 2500 connections.
I had to restart syslog-ng to stop this situation.
Has anyone seen this behavior before?
I get a lot of TLS connections without a certificate.
Error reading RFC6587 style framed data
Pperhaps the counters are not decremented for those timed out connections?
-- Evan Rempel