Some more anecdotal details. I know
this is not really a systematic approach to tyrouble shooting
this.
I guess I'll have to turn on the
verbose stats. We collect them regularly so that might get
overwhelmed.
The issue started again.
Number of allowed concurrent
connections reached, rejecting connection;
client='AF_INET(XXXX:50773)', local='AF_INET(YYYY:6514)',
group_name='client_network_tcp',
location='/etc/syslog-ng/syslog-ng.server.conf:61:9', max='15000'
The established connection count was
2747
$ netstat -an | grep ESTABLISHED | grep
-c 6514
2747
The output destination stats were
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945402
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;memory_usage;148574728
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;written;1839783
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;truncated_count;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;truncated_bytes;0
repeating this statistics report a few
times with 2-3 seconds between then it shows
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945406
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945406
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945410
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945410
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1960812
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;121005
reload syslong-ng
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2084011
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;97926
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2084013
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;97938
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2103861
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;19842
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2107477
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;23463
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2112825
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;28810
By the time I was this far
composingthis message, things seemed to have stalled again
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2909715
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;54738
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2909718
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;54738
eventually it started cycling through
the large queueing and flushing
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3177968
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;157790
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3187677
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;2150
with periods of getting stuck
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3418242
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;62294
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3418246
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;62294
and judging by the cpu usage, not all
of the log messages are being processed.
Verbose stats ... here I come.
On 2022-02-18 04:08, Laszlo Varady
(lvarady) wrote:
Hi,
A macroed file destination is unlikely to cause such an issue
if the location is writable.
Just a tip that might help ruling out the case I mentioned:
Setting the stats-level() to 4 results in an extremely verbose
stats output, where a counter called "free_window" can be
found for each network connection.
This can be used to check whether a connection is suspended or
not. Note that this is a momentary value, which oscillates
between 0 and full_window, so a momentary 0 does not mean
anything bad,
but we're looking for fixed 0 "free_window" values.
--
László Várady
CAUTION:
This email originated from outside of the organization. Do
not follow guidance, click links, or open attachments unless
you recognize the sender and know the content is safe.
There is only 1 destination,
although it is a file named with macros of date and hour.
destination workstation.log {
file("/var/syslog/workstation.log.$R_YEAR$R_MONTH$R_DAY.${R_HOUR}0000"
); };
And that volume has never
become full.
Also, with a file based
destination I can't actually turn flow-control off since
files have soft flow-control.
Because I have 3500 real
connections that are all active (total of 10,000 messages
per second) syslog-ng cycles through reading 100 messages
from each source for 35000 messages and then writing those
to disk. it is very common for the queued messages to
fluctuate from a few hundred to 200,000 messages.
I may have to add some new
metrics to our statistics gathering to understand more
about what is happening.
Evan Rempel.
On 2022-02-17 13:13, Laszlo
Varady (lvarady) wrote:
Hi,
Do you have flags(flow-control) specified in your log
paths?
If so, a dead destination in such log paths might
cause the mentioned issue.
When flow-control is activated, the corresponding
sources will be suspended. This suspended state does
not even allow syslog-ng to truly release connections
that have been closed by the clients.
This is actually more of expected behavior as we don't
want to allow new connections in situations where logs
could not be delivered anyway.
Please check the queued statistic counters of
"syslog-ng-ctl stats" to see whether this is the case.
In case of anything else, I would
suspect a bug.
--
László Várady
I am having an issue that
is a little difficult to reproduce so I wanted
some input from others.
I have a syslog-ng 3.35.1 that has a TLS source
defined with
max-connections(10000)
After some time the server starts logging a lot
of messages
syslog-ng[12802]: Number of allowed concurrent
connections reached,
rejecting connection;
client='AF_INET(XXXX:61062)',
local='AF_INET(YYYY:6514)',
group_name='client_network_tcp',
location='/etc/syslog-ng/syslog-ng.server.conf:61:9', max='10000'
To the best of my ability I can only find about
2500 actual connections.
Both lsof and netstat report around the 2500
connections.
I had to restart syslog-ng to stop this
situation.
Has anyone seen this behavior before?
I get a lot of TLS connections without a
certificate.
Error reading RFC6587 style framed data
Pperhaps the counters are not decremented for
those timed out connections?
--
Evan Rempel
--
Evan Rempel 250.721.7691
Senior Systems Administrator erempel@uvic.ca
Data Centre Services, University Systems, University of Victoria