[syslog-ng] Resolved - Re: allowed concurrent connections - bug?
Evan Rempel
erempel at uvic.ca
Fri Feb 18 19:20:07 UTC 2022
This was related to dns resolution. This particular logging service has
clients that are not on our own network and as such the DNS resolution
is "far away" as far as networks are concerned.
By turning off dns resolution in syslog-ng the queuing became nearly 0.
No longer was the queue contents cycling from 0 through to 200,000
messages and then flushing.
Then I added nscd using hostname caching only, and enable the syslog-ng
dns resolution again. The great performance remained with a near zero
queue size.
Now I have the same functionality with dns names in my log files and I
do not have any slow downs for name resolution inside of syslog-ng.
Only time will tell if this is related to the allowed connections issue
but they seem to be strongly linked at this time.
Evan.
On 2022-02-18 09:45, Evan Rempel wrote:
> Some more anecdotal details. I know this is not really a systematic
> approach to tyrouble shooting this.
> I guess I'll have to turn on the verbose stats. We collect them
> regularly so that might get overwhelmed.
>
> The issue started again.
>
> Number of allowed concurrent connections reached, rejecting
> connection; client='AF_INET(XXXX:50773)', local='AF_INET(YYYY:6514)',
> group_name='client_network_tcp',
> location='/etc/syslog-ng/syslog-ng.server.conf:61:9', max='15000'
>
>
> The established connection count was 2747
>
> $ netstat -an | grep ESTABLISHED | grep -c 6514
> 2747
>
> The output destination stats were
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945402
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;memory_usage;148574728
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;written;1839783
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;truncated_count;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;truncated_bytes;0
>
> repeating this statistics report a few times with 2-3 seconds between
> then it shows
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945406
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945406
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945410
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945410
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1960812
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;121005
>
> reload syslong-ng
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2084011
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;97926
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2084013
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;97938
>
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2103861
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;19842
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2107477
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;23463
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2112825
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;28810
>
> By the time I was this far composingthis message, things seemed to
> have stalled again
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2909715
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;54738
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2909718
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;54738
>
> eventually it started cycling through the large queueing and flushing
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3177968
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;157790
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3187677
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;2150
>
> with periods of getting stuck
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3418242
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;62294
>
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3418246
> dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;62294
>
> and judging by the cpu usage, not all of the log messages are being
> processed.
>
> Verbose stats ... here I come.
>
>
> On 2022-02-18 04:08, Laszlo Varady (lvarady) wrote:
>>
>> Hi,
>>
>> A macroed file destination is unlikely to cause such an issue if the
>> location is writable.
>>
>> Just a tip that might help ruling out the case I mentioned:
>> Setting the stats-level() to 4 results in an extremely verbose stats
>> output, where a counter called "free_window" can be found for each
>> network connection.
>> This can be used to check whether a connection is suspended or not.
>> Note that this is a momentary value, which oscillates between 0 and
>> full_window, so a momentary 0 does not mean anything bad,
>> but we're looking for fixed 0 "free_window" values.
>>
>> --
>> László Várady
>> ------------------------------------------------------------------------
>> *From:* syslog-ng <syslog-ng-bounces at lists.balabit.hu> on behalf of
>> Evan Rempel <erempel at uvic.ca>
>> *Sent:* Friday, February 18, 2022 3:22
>> *To:* syslog-ng at lists.balabit.hu <syslog-ng at lists.balabit.hu>
>> *Subject:* Re: [syslog-ng] allowed concurrent connections - bug?
>> CAUTION: This email originated from outside of the organization. Do
>> not follow guidance, click links, or open attachments unless you
>> recognize the sender and know the content is safe.
>>
>> There is only 1 destination, although it is a file named with macros
>> of date and hour.
>>
>> destination workstation.log {
>> file("/var/syslog/workstation.log.$R_YEAR$R_MONTH$R_DAY.${R_HOUR}0000"
>> ); };
>>
>> And that volume has never become full.
>>
>> Also, with a file based destination I can't actually turn
>> flow-control off since files have soft flow-control.
>>
>> Because I have 3500 real connections that are all active (total of
>> 10,000 messages per second) syslog-ng cycles through reading 100
>> messages from each source for 35000 messages and then writing those
>> to disk. it is very common for the queued messages to fluctuate from
>> a few hundred to 200,000 messages.
>>
>> I may have to add some new metrics to our statistics gathering to
>> understand more about what is happening.
>>
>> Evan Rempel.
>>
>>
>> On 2022-02-17 13:13, Laszlo Varady (lvarady) wrote:
>>>
>>> Hi,
>>>
>>> Do you have flags(flow-control) specified in your log paths?
>>> If so, a dead destination in such log paths might cause the
>>> mentioned issue.
>>>
>>> When flow-control is activated, the corresponding sources will be
>>> suspended. This suspended state does not even allow syslog-ng to
>>> truly release connections that have been closed by the clients.
>>> This is actually more of expected behavior as we don't want to allow
>>> new connections in situations where logs could not be delivered anyway.
>>>
>>> Please check the queued statistic counters of "syslog-ng-ctl stats"
>>> to see whether this is the case.
>>>
>>> In case of anything else, I would suspect a bug.
>>>
>>> --
>>> László Várady
>>> ------------------------------------------------------------------------
>>> *From:* syslog-ng <syslog-ng-bounces at lists.balabit.hu>
>>> <mailto:syslog-ng-bounces at lists.balabit.hu> on behalf of Evan Rempel
>>> <erempel at uvic.ca> <mailto:erempel at uvic.ca>
>>> *Sent:* Thursday, February 17, 2022 19:01
>>> *To:* syslog-ng at lists.balabit.hu <mailto:syslog-ng at lists.balabit.hu>
>>> <syslog-ng at lists.balabit.hu> <mailto:syslog-ng at lists.balabit.hu>
>>> *Subject:* [syslog-ng] allowed concurrent connections - bug?
>>>
>>> I am having an issue that is a little difficult to reproduce so I wanted
>>> some input from others.
>>>
>>> I have a syslog-ng 3.35.1 that has a TLS source defined with
>>> max-connections(10000)
>>>
>>> After some time the server starts logging a lot of messages
>>>
>>> syslog-ng[12802]: Number of allowed concurrent connections reached,
>>> rejecting connection; client='AF_INET(XXXX:61062)',
>>> local='AF_INET(YYYY:6514)', group_name='client_network_tcp',
>>> location='/etc/syslog-ng/syslog-ng.server.conf:61:9', max='10000'
>>>
>>> To the best of my ability I can only find about 2500 actual connections.
>>>
>>> Both lsof and netstat report around the 2500 connections.
>>>
>>> I had to restart syslog-ng to stop this situation.
>>>
>>> Has anyone seen this behavior before?
>>>
>>> I get a lot of TLS connections without a certificate.
>>>
>>> Error reading RFC6587 style framed data
>>>
>>> Pperhaps the counters are not decremented for those timed out
>>> connections?
>>>
>>> --
>>> Evan Rempel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20220218/e692de4c/attachment-0001.htm>
More information about the syslog-ng
mailing list