Some more anecdotal details. I
know this is not really a systematic approach to tyrouble
shooting this.
I guess I'll have to turn on the
verbose stats. We collect them regularly so that might get
overwhelmed.
The issue started again.
Number of allowed concurrent
connections reached, rejecting connection;
client='AF_INET(XXXX:50773)', local='AF_INET(YYYY:6514)',
group_name='client_network_tcp',
location='/etc/syslog-ng/syslog-ng.server.conf:61:9',
max='15000'
The established connection count
was 2747
$ netstat -an | grep ESTABLISHED
| grep -c 6514
2747
The output destination stats were
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945402
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;memory_usage;148574728
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;written;1839783
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;truncated_count;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;truncated_bytes;0
repeating this statistics report
a few times with 2-3 seconds between then it shows
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945406
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945406
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945410
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1945410
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;105619
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;1960812
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;121005
reload syslong-ng
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2084011
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;97926
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2084013
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;97938
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2103861
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;19842
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2107477
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;23463
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2112825
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;28810
By the time I was this far
composingthis message, things seemed to have stalled again
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2909715
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;54738
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;2909718
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;54738
eventually it started cycling
through the large queueing and flushing
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3177968
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;157790
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3187677
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;2150
with periods of getting stuck
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3418242
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;62294
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;dropped;0
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;processed;3418246
dst.file;workstation.log#0;/var/syslog/workstation.log.20220218.090000;a;queued;62294
and judging by the cpu usage, not
all of the log messages are being processed.
Verbose stats ... here I come.
On 2022-02-18 04:08, Laszlo
Varady (lvarady) wrote:
Hi,
A macroed file destination is unlikely to cause such an
issue if the location is writable.
Just a tip that might help ruling out the case I
mentioned:
Setting the stats-level() to 4 results in an extremely
verbose stats output, where a counter called
"free_window" can be found for each network connection.
This can be used to check whether a connection is
suspended or not. Note that this is a momentary value,
which oscillates between 0 and full_window, so a
momentary 0 does not mean anything bad,
but we're looking for fixed 0 "free_window" values.
--
László Várady
CAUTION:
This email originated from outside of the
organization. Do not follow guidance, click links, or
open attachments unless you recognize the sender and
know the content is safe.
There is only 1
destination, although it is a file named with macros
of date and hour.
destination
workstation.log {
file("/var/syslog/workstation.log.$R_YEAR$R_MONTH$R_DAY.${R_HOUR}0000"
); };
And that volume has
never become full.
Also, with a file based
destination I can't actually turn flow-control off
since files have soft flow-control.
Because I have 3500
real connections that are all active (total of
10,000 messages per second) syslog-ng cycles through
reading 100 messages from each source for 35000
messages and then writing those to disk. it is very
common for the queued messages to fluctuate from a
few hundred to 200,000 messages.
I may have to add some
new metrics to our statistics gathering to
understand more about what is happening.
Evan Rempel.
On 2022-02-17 13:13,
Laszlo Varady (lvarady) wrote:
Hi,
Do you have flags(flow-control) specified in
your log paths?
If so, a dead destination in such log paths
might cause the mentioned issue.
When flow-control is activated, the
corresponding sources will be suspended. This
suspended state does not even allow syslog-ng to
truly release connections that have been closed
by the clients.
This is actually more of expected behavior as we
don't want to allow new connections in
situations where logs could not be delivered
anyway.
Please check the queued statistic counters of
"syslog-ng-ctl stats" to see whether this is the
case.
In case of
anything else, I would suspect a bug.
--
László Várady
I am having an
issue that is a little difficult to
reproduce so I wanted
some input from others.
I have a syslog-ng 3.35.1 that has a TLS
source defined with
max-connections(10000)
After some time the server starts logging
a lot of messages
syslog-ng[12802]: Number of allowed
concurrent connections reached,
rejecting connection;
client='AF_INET(XXXX:61062)',
local='AF_INET(YYYY:6514)',
group_name='client_network_tcp',
location='/etc/syslog-ng/syslog-ng.server.conf:61:9', max='10000'
To the best of my ability I can only find
about 2500 actual connections.
Both lsof and netstat report around the
2500 connections.
I had to restart syslog-ng to stop this
situation.
Has anyone seen this behavior before?
I get a lot of TLS connections without a
certificate.
Error reading RFC6587 style framed data
Pperhaps the counters are not decremented
for those timed out connections?
--
Evan Rempel