[syslog-ng] Local sources seem not to be working

Gabor Nagy (gnagy) Gabor.Nagy at oneidentity.com
Thu Apr 28 08:34:17 UTC 2022


Hi,

Sorry for not replying earlier.
Thanks for the stats output, I've checked it.
It looks like messages were dropped, almost 100k messages.
dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;dropped;95102
dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;processed;3868284
dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;queued;0
dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;written;3777905

I think what's even more troubling is that the other syslog-ng instance that's sending to Grafana dropped 2,2M messages:
dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;dropped;2219335
dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;processed;4090485
dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;queued;3753
dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;suppressed;24682
dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;written;1874813

I've tried reproducing the issue, I've placed the disk-buffer files on a tmpfs filesystem (with limited size, so I can test what happens if the disk becomes full).
I was unable to reproduce the issue. For me, both the system() and internal() sources were continuing to read new messages, but since the disk-buffer is full, the messages were dropped.
I've asked for some help from the team, as I have got other priorities now, so someone else will be looking into this deeper.

I think we could narrow down the issue once we have reproduction steps, but I know this is not easy.

The biggest mistery for me is still, how could the syslog() UDP!!! source forward messages () when system() and internal() doesn't.
The second is how could messages go out from d_mgmg_vrf_socket if the destination is full? Maybe this isn't what is happening, so let's clear out what is really happening when "local sources doesn't work".
>From an earlier mail:
> You're saying that the syslog() source in s_src is receiving message, while the internal() and system() doesn't?
> [Alexandre Santos] Yes, I think that it is what is happening. Logs from syslog() source are being written to the /var/logs/..., while journald logs are not.

The issue is that there are no messages from system() and internal() in the /var/log/... output files, right?
So I can assume that no messages from syslog() source go out on the d_mgmt_vrf_socket() destination when the issue happens?


Suggestions:

  1.  An important thing to emphasize is that your config clearly shows that the disk-buffer is extremely small??, especially without flow-control!!!!!
I strongly recommend adjusting it's size from 4MB to t least a 1GB size!!!!
(It's a different question that storing a disk-buffer on a tmpfs filesystem (which is RAM based) is not really persistent, but that can be okay as long as the system runs; it's a step better than using only in-memory queue in syslog-ng)

  2.  You could lower the value of the reconnect timeout from the default 60 seconds to 10 seconds (just a guess), so it will be a lower window of time when messages have to be queued.

Regards,
Gabor


Gabor
________________________________
From: Alexandre Santos <ASantos at infinera.com>
Sent: Thursday, April 21, 2022 15:22
To: Gabor Nagy (gnagy) <Gabor.Nagy at oneidentity.com>; Syslog-ng users' and developers' mailing list <syslog-ng at lists.balabit.hu>
Subject: RE: Local sources seem not to be working

CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.


Hi Gabor,



The problem was reproduced again.

I was able to get the stats in the error situation: 54146.stats.error.txt



I also took the stats after reloading the configuration (which fixes the problem): 54146.stats.after.txt



Regarding your question: Just a random question: is /tmp a tmpfs filesystem?

Yes it is.



Let me know what you found out.



Thanks and regards,

Alex



From: Gabor Nagy (gnagy) <Gabor.Nagy at oneidentity.com>
Sent: 31 de março de 2022 11:14
To: Alexandre Santos <ASantos at infinera.com>; Syslog-ng users' and developers' mailing list <syslog-ng at lists.balabit.hu>
Subject: Re: Local sources seem not to be working



Thanks for the config! I'll continue experimenting on my ideas

You could either configure syslog-ng-ctl stats to talk to a given syslog-ng instance with the --control option pointing to the control-socket e.g. as above /var/lib/syslog-ng/mgmt-syslog-ng.ctl,

OR use the syslog-ng-ctl instance under the 2nd syslog-ng installation path.

Just a random question: is /tmp a tmpfs filesystem?



Gabor

________________________________

From: Alexandre Santos <ASantos at infinera.com<mailto:ASantos at infinera.com>>
Sent: Wednesday, March 30, 2022 17:07
To: Gabor Nagy (gnagy) <Gabor.Nagy at oneidentity.com<mailto:Gabor.Nagy at oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng at lists.balabit.hu<mailto:syslog-ng at lists.balabit.hu>>
Subject: RE: Local sources seem not to be working



CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.



Hi Gabor,



Thank you for you feedback.



Can you share the config, when the issue cannot be seen?

I am sending the configuration in attachment.



I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens.

The issue is hard to reproduce, the next time the error is seen, I try to run it.

But how can I run syslog-ng-ctl stats for the 2nd syslog-ng instance?

root at machine:/~# ps -ewfH | grep syslog-ng

root      2582     1  0 09:06 ?        00:00:34   /usr/sbin/syslog-ng -F --caps cap_net_bind_service,cap_net_broadcast,cap_net_raw,cap_dac_read_search,cap_chown,cap_fowner=p cap_dac_override,cap_syslog=ep

root      4018     1  0 09:07 ?        00:00:00   /usr/sbin/syslog-ng -F --cfgfile=/etc/syslog-ng/mgmt-syslog-ng.conf --pidfile=/var/lib/syslog-ng/mgmt-syslog-ng.pid --persist-file=/var/lib/syslog-ng/mgmt-syslog-ng.persist --control=/var/lib/syslog-ng/mgmt-syslog-ng.ctl

syslog-ng-ctl, seems to only show stats for job 2582.



Regards, Alex





From: Gabor Nagy (gnagy) <Gabor.Nagy at oneidentity.com<mailto:Gabor.Nagy at oneidentity.com>>
Sent: 29 de março de 2022 12:48
To: Alexandre Santos <ASantos at infinera.com<mailto:ASantos at infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng at lists.balabit.hu<mailto:syslog-ng at lists.balabit.hu>>
Subject: Re: Local sources seem not to be working



Hi Alex,



Using regular disk-buffer vs. using reliable disk-buffer shouldn't cause symptoms like that. It sounds like reliable(yes) would turn on a flow-control-like behaviour, which it doesn't.

(And as you said it only affects local sources).
The main difference between the two kinds of disk-buffers is, that while reliable disk-buffer write every message to the disk-buffer, a normal disk-buffer has memory-only buffers for performance reasons (and flow-control reasons too).

You can still lose logs with a reliable disk-buffer if no flow-control is used: when the disk-buffer has reached it's maximum size and new messages keep arriving, then syslog-ng drops those messages.



We have more detailed documentation about disk-buffers in the admin guide, where you can see the structure of disk-buffers:
https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.36/administration-guide/61#TOPIC-1768724<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.syslog-ng.com%2Ftechnical-documents%2Fdoc%2Fsyslog-ng-open-source-edition%2F3.36%2Fadministration-guide%2F61%23TOPIC-1768724&data=05%7C01%7CGabor.Nagy%40oneidentity.com%7C9c48f3192ab24fd1d7e308da2399febc%7C91c369b51c9e439c989c1867ec606603%7C0%7C1%7C637861441566677311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=QdycH9%2FaR1MF5aqqNmbug5deq9NuGESAqk%2BZ9qd5Kuo%3D&reserved=0>



Can you share the config, when the issue cannot be seen?

I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens.

Regards,

Gabor

________________________________

From: Alexandre Santos <ASantos at infinera.com<mailto:ASantos at infinera.com>>
Sent: Monday, March 28, 2022 13:45
To: Gabor Nagy (gnagy) <Gabor.Nagy at oneidentity.com<mailto:Gabor.Nagy at oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng at lists.balabit.hu<mailto:syslog-ng at lists.balabit.hu>>
Subject: RE: Local sources seem not to be working



CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.



Hi Gabor,



“This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources?”

Yes.



And the issue vanishes when "d_mgmt_vrf_socket" destination is removed?

Yes.



I could not test the 2 last suggestions that you made.



We did however another test, which was to remove the reliable option from d_mgmt_vrf_socket, and it seems the problem is not seen again.



Besides from what it is written in the manual, in other which cases/conditions can syslog-ng loose logs?



reliable()

Type:

yes|no

Default:

no

Description: If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash. This solution provides a slower, but reliable disk-buffer option. It is created and initialized at startup and gradually grows as new messages arrive. If set to no, the normal disk-buffer will be used. This provides a faster, but less reliable disk-buffer option.



Thanks in advance,

Alex



From: Gabor Nagy (gnagy) <Gabor.Nagy at oneidentity.com<mailto:Gabor.Nagy at oneidentity.com>>
Sent: 25 de março de 2022 14:44
To: Alexandre Santos <ASantos at infinera.com<mailto:ASantos at infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng at lists.balabit.hu<mailto:syslog-ng at lists.balabit.hu>>
Subject: Re: Local sources seem not to be working



CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.



Hi Alex,

Sorry I haven't answered yet. I'll have a few ideas I would like to try out next week.

This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources?

And the issue vanishes when "d_mgmt_vrf_socket" destination is removed?

If it would be soft flow-control, then the syslog() source would be suspended too.

Just a tip: would you switch out the unix-dgram() destination to syslog() destination, please? Maybe that's not possible with the VRF in-place...

In the stats output, do you see an increased number of dropped messages?



I would still suggest increasing the 4MB disk-buffer. You should make an estimation of how long could the mgmt syslog-ng be down (i.e not receiving from the unix-dgram), what is the average incoming EPS and an average message size, that could give a hint about the required disk-buffer size.

Regards,

Gabor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20220428/161fdd54/attachment-0001.htm>


More information about the syslog-ng mailing list