Local sources seem not to be working
Hi, I have syslog-ng 3.32.1 running in a Debian GNU/Linux 10 (buster) with the configuration in the attachement. After sometime running, syslog-ng seems be unable to read from system() and internal() sources. Log messages from syslog(ip(10.20.30.40) transport("udp") port(514) keep-alive(no)); are seen in the output folders. Also journald logs are working fine. After a reload of configuration in which what changes is this line: rewrite r_host { set("MACHINE-${HOST}", value("HOST")); }; logging is resumed. Here is the time gap for logs: <43>1 2022-03-11T11:55:23.802+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="767"] Last message 'Destination reliable' repeated 8933 times, suppressed by syslog-ng on xmm4-1-1 <46>1 2022-03-14T07:19:01.817+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="1"] Module loaded and initialized successfully; module='syslogformat' Do you know why this is happening? Thanks & Regards, Alex
Hi Alex! I've checked the attached config and logs, and it looks like syslog-ng cannot send logs to the "/dev/uds_log" destination, and you have flow-control enabled in the config. Once you fill the disk-buffer (which is a 4MiB sized reliable disk-buffer), flow-control kicks in and syslog-ng stops reading more messages from the sources that are connected to this destination. example log: Destination reliable queue full, dropping message; filename='/tmp/syslog-ng-00016.rqf', queue_len='6063', mem_buf_size='2097152', disk_buf_size='4194304', persist_name='afsocket_dd_qfile(stream,localhost.afunix:/dev/uds_log)' At first, I would suggest to increase the disk-buffer size. Regards, Gabor ________________________________ From: syslog-ng <syslog-ng-bounces@lists.balabit.hu> on behalf of Alexandre Santos <ASantos@infinera.com> Sent: Tuesday, March 15, 2022 16:04 To: syslog-ng@lists.balabit.hu <syslog-ng@lists.balabit.hu> Subject: [syslog-ng] Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi, I have syslog-ng 3.32.1 running in a Debian GNU/Linux 10 (buster) with the configuration in the attachement. After sometime running, syslog-ng seems be unable to read from system() and internal() sources. Log messages from syslog(ip(10.20.30.40) transport("udp") port(514) keep-alive(no)); are seen in the output folders. Also journald logs are working fine. After a reload of configuration in which what changes is this line: rewrite r_host { set("MACHINE-${HOST}", value("HOST")); }; logging is resumed. Here is the time gap for logs: <43>1 2022-03-11T11:55:23.802+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="767"] Last message 'Destination reliable' repeated 8933 times, suppressed by syslog-ng on xmm4-1-1 <46>1 2022-03-14T07:19:01.817+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="1"] Module loaded and initialized successfully; module='syslogformat' Do you know why this is happening? Thanks & Regards, Alex
Hi Gabor, Thanks for the feedback. But the flags(flow-control); is not set for the destination d_mgmt_vrf_socket. Only for the other destinations... d_localfile_<filename>. That also does not explain the fact that log messages from: syslog(ip(10.20.30.40) transport("udp") port(514) keep-alive(no)); are still being written to the d_localfile_<filename>. Any other idea? Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com> Sent: 16 de março de 2022 15:09 To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu>; Alexandre Santos <ASantos@infinera.com> Subject: Re: Local sources seem not to be working Hi Alex! I've checked the attached config and logs, and it looks like syslog-ng cannot send logs to the "/dev/uds_log" destination, and you have flow-control enabled in the config. Once you fill the disk-buffer (which is a 4MiB sized reliable disk-buffer), flow-control kicks in and syslog-ng stops reading more messages from the sources that are connected to this destination. example log: Destination reliable queue full, dropping message; filename='/tmp/syslog-ng-00016.rqf', queue_len='6063', mem_buf_size='2097152', disk_buf_size='4194304', persist_name='afsocket_dd_qfile(stream,localhost.afunix:/dev/uds_log)' At first, I would suggest to increase the disk-buffer size. Regards, Gabor ________________________________ From: syslog-ng <syslog-ng-bounces@lists.balabit.hu> on behalf of Alexandre Santos <ASantos@infinera.com> Sent: Tuesday, March 15, 2022 16:04 To: syslog-ng@lists.balabit.hu <syslog-ng@lists.balabit.hu> Subject: [syslog-ng] Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi, I have syslog-ng 3.32.1 running in a Debian GNU/Linux 10 (buster) with the configuration in the attachement. After sometime running, syslog-ng seems be unable to read from system() and internal() sources. Log messages from syslog(ip(10.20.30.40) transport("udp") port(514) keep-alive(no)); are seen in the output folders. Also journald logs are working fine. After a reload of configuration in which what changes is this line: rewrite r_host { set("MACHINE-${HOST}", value("HOST")); }; logging is resumed. Here is the time gap for logs: <43>1 2022-03-11T11:55:23.802+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="767"] Last message 'Destination reliable' repeated 8933 times, suppressed by syslog-ng on xmm4-1-1 <46>1 2022-03-14T07:19:01.817+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="1"] Module loaded and initialized successfully; module='syslogformat' Do you know why this is happening? Thanks & Regards, Alex
You are right, there is no flow-control for the log path where d_mgmt_vrf_socket destination is, I'm sorry. Still, the internal log messages that the disk-buffer of d_mgmt_vrf_socket is filled are correct, but the source is not suspended. I have some trouble understanding the problem, can you explain it please? You're saying that the syslog() source in s_src is receiving message, while the internal() and system() doesn't? You've also stated that journald logs are working fine. Does that mean that you can see new logs in journal, but not in syslog-ng? When the issue happens, can you check that internal() is working, e.g. by turning on and off the verbosity logging with "syslog-ng-ctl verbose --set on" and then "sbin/syslog-ng-ctl verbose --set off", please? This would generate an internal message with info level. Also, can you check system() source as well with the "logger" command, e.g. "logger --rfc3164 test syslog-ng", please? Could you give us a syslog-ng-ctl stats output too, please? Maybe I have found something, but I have to double-check: it looks like internal() source's messages are suppressed due to the destination d_mgmt_vrf_socket is unreachable: <44>1 2022-03-11T11:52:45.313+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="4"] internal() messages are looping back, preventing loop by suppressing all internal messages until the current message is processed; trigger-msg='', first-suppressed-msg='Suppressing duplicate message; host=\'xmm4-1-1\', msg=\'Destination reliable queue full, dropping message; filename=\\'/tmp/syslog-ng-00016.rqf\\', queue_len=\\'6063\\', mem_buf_size=\\'2097152\\', disk_buf_size=\\'4194304\\', persist_name=\\'afsocket_dd_qfile(stream,localhost.afunix:/dev/uds_log)\\'\'' This means that there are no internal() logs until the destination is not reachable again. Regards, Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com> Sent: Wednesday, March 16, 2022 16:53 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, Thanks for the feedback. But the flags(flow-control); is not set for the destination d_mgmt_vrf_socket. Only for the other destinations… d_localfile_<filename>. That also does not explain the fact that log messages from: syslog(ip(10.20.30.40) transport("udp") port(514) keep-alive(no)); are still being written to the d_localfile_<filename>. Any other idea? Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com> Sent: 16 de março de 2022 15:09 To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu>; Alexandre Santos <ASantos@infinera.com> Subject: Re: Local sources seem not to be working Hi Alex! I've checked the attached config and logs, and it looks like syslog-ng cannot send logs to the "/dev/uds_log" destination, and you have flow-control enabled in the config. Once you fill the disk-buffer (which is a 4MiB sized reliable disk-buffer), flow-control kicks in and syslog-ng stops reading more messages from the sources that are connected to this destination. example log: Destination reliable queue full, dropping message; filename='/tmp/syslog-ng-00016.rqf', queue_len='6063', mem_buf_size='2097152', disk_buf_size='4194304', persist_name='afsocket_dd_qfile(stream,localhost.afunix:/dev/uds_log)' At first, I would suggest to increase the disk-buffer size. Regards, Gabor ________________________________ From: syslog-ng <syslog-ng-bounces@lists.balabit.hu> on behalf of Alexandre Santos <ASantos@infinera.com> Sent: Tuesday, March 15, 2022 16:04 To: syslog-ng@lists.balabit.hu <syslog-ng@lists.balabit.hu> Subject: [syslog-ng] Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi, I have syslog-ng 3.32.1 running in a Debian GNU/Linux 10 (buster) with the configuration in the attachement. After sometime running, syslog-ng seems be unable to read from system() and internal() sources. Log messages from syslog(ip(10.20.30.40) transport("udp") port(514) keep-alive(no)); are seen in the output folders. Also journald logs are working fine. After a reload of configuration in which what changes is this line: rewrite r_host { set("MACHINE-${HOST}", value("HOST")); }; logging is resumed. Here is the time gap for logs: <43>1 2022-03-11T11:55:23.802+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="767"] Last message 'Destination reliable' repeated 8933 times, suppressed by syslog-ng on xmm4-1-1 <46>1 2022-03-14T07:19:01.817+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="1"] Module loaded and initialized successfully; module='syslogformat' Do you know why this is happening? Thanks & Regards, Alex
Hi Gabor, Thanks for the follow up and check my answers bellow in inline with my last email. Some more details about the setup and another test it was done. The system is running two syslog-ng instances, one in the default VRF and other in an Outer VRF. syslog-ng -------------- uds socket ------------------> mgmt-syslog-ng -------- UDP ---------> [Log Server] The syslog-ng in the default VRF is sending logs to the syslog-ng running in the outer VRF via Unix Domain Socket (destination d_mgmt_vrf_socket). The mgmt-syslog-ng is running in the outer VRF and sending logs to the outside world. Only the syslog-ng in the default VRF is reading sources internal and system. We tested without having the remote logging (destination d_mgmt_vrf_socket) in the syslog-ng, and the problem did not appeared. Hope this can give some enlightening about the problem. Thanks & Regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com> Sent: 17 de março de 2022 20:09 To: Alexandre Santos <ASantos@infinera.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: Local sources seem not to be working You are right, there is no flow-control for the log path where d_mgmt_vrf_socket destination is, I'm sorry. Still, the internal log messages that the disk-buffer of d_mgmt_vrf_socket is filled are correct, but the source is not suspended. I have some trouble understanding the problem, can you explain it please? You're saying that the syslog() source in s_src is receiving message, while the internal() and system() doesn't? [Alexandre Santos] Yes, I think that it is what is happening. Logs from syslog() source are being written to the /var/logs/..., while journald logs are not. You've also stated that journald logs are working fine. Does that mean that you can see new logs in journal, but not in syslog-ng? [Alexandre Santos] Yes. When the issue happens, can you check that internal() is working, e.g. by turning on and off the verbosity logging with "syslog-ng-ctl verbose --set on" and then "sbin/syslog-ng-ctl verbose --set off", please? [Alexandre Santos] I saw no logs when I did this in error condition so I assume internal is not working as well. This would generate an internal message with info level. Also, can you check system() source as well with the "logger" command, e.g. "logger --rfc3164 test syslog-ng", please? Could you give us a syslog-ng-ctl stats output too, please? [Alexandre Santos] I have to this in the next test iteration. Maybe I have found something, but I have to double-check: it looks like internal() source's messages are suppressed due to the destination d_mgmt_vrf_socket is unreachable: <44>1 2022-03-11T11:52:45.313+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="4"] internal() messages are looping back, preventing loop by suppressing all internal messages until the current message is processed; trigger-msg='', first-suppressed-msg='Suppressing duplicate message; host=\'xmm4-1-1\', msg=\'Destination reliable queue full, dropping message; filename=\\'/tmp/syslog-ng-00016.rqf\\', queue_len=\\'6063\\', mem_buf_size=\\'2097152\\', disk_buf_size=\\'4194304\\', persist_name=\\'afsocket_dd_qfile(stream,localhost.afunix:/dev/uds_log)\\'\'' This means that there are no internal() logs until the destination is not reachable again. Regards, Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Wednesday, March 16, 2022 16:53 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, Thanks for the feedback. But the flags(flow-control); is not set for the destination d_mgmt_vrf_socket. Only for the other destinations... d_localfile_<filename>. That also does not explain the fact that log messages from: syslog(ip(10.20.30.40) transport("udp") port(514) keep-alive(no)); are still being written to the d_localfile_<filename>. Any other idea? Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 16 de março de 2022 15:09 To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>>; Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Subject: Re: Local sources seem not to be working Hi Alex! I've checked the attached config and logs, and it looks like syslog-ng cannot send logs to the "/dev/uds_log" destination, and you have flow-control enabled in the config. Once you fill the disk-buffer (which is a 4MiB sized reliable disk-buffer), flow-control kicks in and syslog-ng stops reading more messages from the sources that are connected to this destination. example log: Destination reliable queue full, dropping message; filename='/tmp/syslog-ng-00016.rqf', queue_len='6063', mem_buf_size='2097152', disk_buf_size='4194304', persist_name='afsocket_dd_qfile(stream,localhost.afunix:/dev/uds_log)' At first, I would suggest to increase the disk-buffer size. Regards, Gabor ________________________________ From: syslog-ng <syslog-ng-bounces@lists.balabit.hu<mailto:syslog-ng-bounces@lists.balabit.hu>> on behalf of Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Tuesday, March 15, 2022 16:04 To: syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu> <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: [syslog-ng] Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi, I have syslog-ng 3.32.1 running in a Debian GNU/Linux 10 (buster) with the configuration in the attachement. After sometime running, syslog-ng seems be unable to read from system() and internal() sources. Log messages from syslog(ip(10.20.30.40) transport("udp") port(514) keep-alive(no)); are seen in the output folders. Also journald logs are working fine. After a reload of configuration in which what changes is this line: rewrite r_host { set("MACHINE-${HOST}", value("HOST")); }; logging is resumed. Here is the time gap for logs: <43>1 2022-03-11T11:55:23.802+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="767"] Last message 'Destination reliable' repeated 8933 times, suppressed by syslog-ng on xmm4-1-1 <46>1 2022-03-14T07:19:01.817+00:00 xmm4-1-1 syslog-ng 8283 - [meta sequenceId="1"] Module loaded and initialized successfully; module='syslogformat' Do you know why this is happening? Thanks & Regards, Alex
Hi Alex, Sorry I haven't answered yet. I'll have a few ideas I would like to try out next week. This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources? And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? If it would be soft flow-control, then the syslog() source would be suspended too. Just a tip: would you switch out the unix-dgram() destination to syslog() destination, please? Maybe that's not possible with the VRF in-place... In the stats output, do you see an increased number of dropped messages? I would still suggest increasing the 4MB disk-buffer. You should make an estimation of how long could the mgmt syslog-ng be down (i.e not receiving from the unix-dgram), what is the average incoming EPS and an average message size, that could give a hint about the required disk-buffer size. Regards, Gabor
Hi Gabor, "This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources?" Yes. And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? Yes. I could not test the 2 last suggestions that you made. We did however another test, which was to remove the reliable option from d_mgmt_vrf_socket, and it seems the problem is not seen again. Besides from what it is written in the manual, in other which cases/conditions can syslog-ng loose logs? reliable() Type: yes|no Default: no Description: If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash. This solution provides a slower, but reliable disk-buffer option. It is created and initialized at startup and gradually grows as new messages arrive. If set to no, the normal disk-buffer will be used. This provides a faster, but less reliable disk-buffer option. Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com> Sent: 25 de março de 2022 14:44 To: Alexandre Santos <ASantos@infinera.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Alex, Sorry I haven't answered yet. I'll have a few ideas I would like to try out next week. This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources? And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? If it would be soft flow-control, then the syslog() source would be suspended too. Just a tip: would you switch out the unix-dgram() destination to syslog() destination, please? Maybe that's not possible with the VRF in-place... In the stats output, do you see an increased number of dropped messages? I would still suggest increasing the 4MB disk-buffer. You should make an estimation of how long could the mgmt syslog-ng be down (i.e not receiving from the unix-dgram), what is the average incoming EPS and an average message size, that could give a hint about the required disk-buffer size. Regards, Gabor
Hi Alex, Using regular disk-buffer vs. using reliable disk-buffer shouldn't cause symptoms like that. It sounds like reliable(yes) would turn on a flow-control-like behaviour, which it doesn't. (And as you said it only affects local sources). The main difference between the two kinds of disk-buffers is, that while reliable disk-buffer write every message to the disk-buffer, a normal disk-buffer has memory-only buffers for performance reasons (and flow-control reasons too). You can still lose logs with a reliable disk-buffer if no flow-control is used: when the disk-buffer has reached it's maximum size and new messages keep arriving, then syslog-ng drops those messages. We have more detailed documentation about disk-buffers in the admin guide, where you can see the structure of disk-buffers: https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edit... Can you share the config, when the issue cannot be seen? I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. Regards, Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com> Sent: Monday, March 28, 2022 13:45 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, “This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources?” Yes. And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? Yes. I could not test the 2 last suggestions that you made. We did however another test, which was to remove the reliable option from d_mgmt_vrf_socket, and it seems the problem is not seen again. Besides from what it is written in the manual, in other which cases/conditions can syslog-ng loose logs? reliable() Type: yes|no Default: no Description: If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash. This solution provides a slower, but reliable disk-buffer option. It is created and initialized at startup and gradually grows as new messages arrive. If set to no, the normal disk-buffer will be used. This provides a faster, but less reliable disk-buffer option. Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com> Sent: 25 de março de 2022 14:44 To: Alexandre Santos <ASantos@infinera.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Alex, Sorry I haven't answered yet. I'll have a few ideas I would like to try out next week. This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources? And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? If it would be soft flow-control, then the syslog() source would be suspended too. Just a tip: would you switch out the unix-dgram() destination to syslog() destination, please? Maybe that's not possible with the VRF in-place... In the stats output, do you see an increased number of dropped messages? I would still suggest increasing the 4MB disk-buffer. You should make an estimation of how long could the mgmt syslog-ng be down (i.e not receiving from the unix-dgram), what is the average incoming EPS and an average message size, that could give a hint about the required disk-buffer size. Regards, Gabor
Hi Gabor, Thank you for you feedback. Can you share the config, when the issue cannot be seen? I am sending the configuration in attachment. I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. The issue is hard to reproduce, the next time the error is seen, I try to run it. But how can I run syslog-ng-ctl stats for the 2nd syslog-ng instance? root@machine:/~# ps -ewfH | grep syslog-ng root 2582 1 0 09:06 ? 00:00:34 /usr/sbin/syslog-ng -F --caps cap_net_bind_service,cap_net_broadcast,cap_net_raw,cap_dac_read_search,cap_chown,cap_fowner=p cap_dac_override,cap_syslog=ep root 4018 1 0 09:07 ? 00:00:00 /usr/sbin/syslog-ng -F --cfgfile=/etc/syslog-ng/mgmt-syslog-ng.conf --pidfile=/var/lib/syslog-ng/mgmt-syslog-ng.pid --persist-file=/var/lib/syslog-ng/mgmt-syslog-ng.persist --control=/var/lib/syslog-ng/mgmt-syslog-ng.ctl syslog-ng-ctl, seems to only show stats for job 2582. Regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com> Sent: 29 de março de 2022 12:48 To: Alexandre Santos <ASantos@infinera.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: Local sources seem not to be working Hi Alex, Using regular disk-buffer vs. using reliable disk-buffer shouldn't cause symptoms like that. It sounds like reliable(yes) would turn on a flow-control-like behaviour, which it doesn't. (And as you said it only affects local sources). The main difference between the two kinds of disk-buffers is, that while reliable disk-buffer write every message to the disk-buffer, a normal disk-buffer has memory-only buffers for performance reasons (and flow-control reasons too). You can still lose logs with a reliable disk-buffer if no flow-control is used: when the disk-buffer has reached it's maximum size and new messages keep arriving, then syslog-ng drops those messages. We have more detailed documentation about disk-buffers in the admin guide, where you can see the structure of disk-buffers: https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.36/administration-guide/61#TOPIC-1768724<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.syslog-ng.com%2Ftechnical-documents%2Fdoc%2Fsyslog-ng-open-source-edition%2F3.36%2Fadministration-guide%2F61%23TOPIC-1768724&data=04%7C01%7CASantos%40infinera.com%7C888d04fa66b245d8a46008da117a0a88%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C1%7C637841513109614956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7M%2BxYTvsi1hu01apm%2BE1gNndB%2BZVzPnjbfvkWiOOBdA%3D&reserved=0> Can you share the config, when the issue cannot be seen? I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. Regards, Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Monday, March 28, 2022 13:45 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, "This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources?" Yes. And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? Yes. I could not test the 2 last suggestions that you made. We did however another test, which was to remove the reliable option from d_mgmt_vrf_socket, and it seems the problem is not seen again. Besides from what it is written in the manual, in other which cases/conditions can syslog-ng loose logs? reliable() Type: yes|no Default: no Description: If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash. This solution provides a slower, but reliable disk-buffer option. It is created and initialized at startup and gradually grows as new messages arrive. If set to no, the normal disk-buffer will be used. This provides a faster, but less reliable disk-buffer option. Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 25 de março de 2022 14:44 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Alex, Sorry I haven't answered yet. I'll have a few ideas I would like to try out next week. This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources? And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? If it would be soft flow-control, then the syslog() source would be suspended too. Just a tip: would you switch out the unix-dgram() destination to syslog() destination, please? Maybe that's not possible with the VRF in-place... In the stats output, do you see an increased number of dropped messages? I would still suggest increasing the 4MB disk-buffer. You should make an estimation of how long could the mgmt syslog-ng be down (i.e not receiving from the unix-dgram), what is the average incoming EPS and an average message size, that could give a hint about the required disk-buffer size. Regards, Gabor
Thanks for the config! I'll continue experimenting on my ideas You could either configure syslog-ng-ctl stats to talk to a given syslog-ng instance with the --control option pointing to the control-socket e.g. as above /var/lib/syslog-ng/mgmt-syslog-ng.ctl, OR use the syslog-ng-ctl instance under the 2nd syslog-ng installation path. Just a random question: is /tmp a tmpfs filesystem? Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com> Sent: Wednesday, March 30, 2022 17:07 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, Thank you for you feedback. Can you share the config, when the issue cannot be seen? I am sending the configuration in attachment. I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. The issue is hard to reproduce, the next time the error is seen, I try to run it. But how can I run syslog-ng-ctl stats for the 2nd syslog-ng instance? root@machine:/~# ps -ewfH | grep syslog-ng root 2582 1 0 09:06 ? 00:00:34 /usr/sbin/syslog-ng -F --caps cap_net_bind_service,cap_net_broadcast,cap_net_raw,cap_dac_read_search,cap_chown,cap_fowner=p cap_dac_override,cap_syslog=ep root 4018 1 0 09:07 ? 00:00:00 /usr/sbin/syslog-ng -F --cfgfile=/etc/syslog-ng/mgmt-syslog-ng.conf --pidfile=/var/lib/syslog-ng/mgmt-syslog-ng.pid --persist-file=/var/lib/syslog-ng/mgmt-syslog-ng.persist --control=/var/lib/syslog-ng/mgmt-syslog-ng.ctl syslog-ng-ctl, seems to only show stats for job 2582. Regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com> Sent: 29 de março de 2022 12:48 To: Alexandre Santos <ASantos@infinera.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: Local sources seem not to be working Hi Alex, Using regular disk-buffer vs. using reliable disk-buffer shouldn't cause symptoms like that. It sounds like reliable(yes) would turn on a flow-control-like behaviour, which it doesn't. (And as you said it only affects local sources). The main difference between the two kinds of disk-buffers is, that while reliable disk-buffer write every message to the disk-buffer, a normal disk-buffer has memory-only buffers for performance reasons (and flow-control reasons too). You can still lose logs with a reliable disk-buffer if no flow-control is used: when the disk-buffer has reached it's maximum size and new messages keep arriving, then syslog-ng drops those messages. We have more detailed documentation about disk-buffers in the admin guide, where you can see the structure of disk-buffers: https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.36/administration-guide/61#TOPIC-1768724<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.syslog-ng.com%2Ftechnical-documents%2Fdoc%2Fsyslog-ng-open-source-edition%2F3.36%2Fadministration-guide%2F61%23TOPIC-1768724&data=04%7C01%7CGabor.Nagy%40oneidentity.com%7C0d85939ee3414be620ac08da125eff40%7C91c369b51c9e439c989c1867ec606603%7C0%7C1%7C637842496489115356%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RuZsQ3d5rVGNBzd1uP8yzHHqKYM5E3Z%2FkThc8F%2Byd7g%3D&reserved=0> Can you share the config, when the issue cannot be seen? I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. Regards, Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Monday, March 28, 2022 13:45 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, “This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources?” Yes. And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? Yes. I could not test the 2 last suggestions that you made. We did however another test, which was to remove the reliable option from d_mgmt_vrf_socket, and it seems the problem is not seen again. Besides from what it is written in the manual, in other which cases/conditions can syslog-ng loose logs? reliable() Type: yes|no Default: no Description: If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash. This solution provides a slower, but reliable disk-buffer option. It is created and initialized at startup and gradually grows as new messages arrive. If set to no, the normal disk-buffer will be used. This provides a faster, but less reliable disk-buffer option. Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 25 de março de 2022 14:44 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Alex, Sorry I haven't answered yet. I'll have a few ideas I would like to try out next week. This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources? And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? If it would be soft flow-control, then the syslog() source would be suspended too. Just a tip: would you switch out the unix-dgram() destination to syslog() destination, please? Maybe that's not possible with the VRF in-place... In the stats output, do you see an increased number of dropped messages? I would still suggest increasing the 4MB disk-buffer. You should make an estimation of how long could the mgmt syslog-ng be down (i.e not receiving from the unix-dgram), what is the average incoming EPS and an average message size, that could give a hint about the required disk-buffer size. Regards, Gabor
Hi Gabor, The problem was reproduced again. I was able to get the stats in the error situation: 54146.stats.error.txt I also took the stats after reloading the configuration (which fixes the problem): 54146.stats.after.txt Regarding your question: Just a random question: is /tmp a tmpfs filesystem? Yes it is. Let me know what you found out. Thanks and regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com> Sent: 31 de março de 2022 11:14 To: Alexandre Santos <ASantos@infinera.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: Local sources seem not to be working Thanks for the config! I'll continue experimenting on my ideas You could either configure syslog-ng-ctl stats to talk to a given syslog-ng instance with the --control option pointing to the control-socket e.g. as above /var/lib/syslog-ng/mgmt-syslog-ng.ctl, OR use the syslog-ng-ctl instance under the 2nd syslog-ng installation path. Just a random question: is /tmp a tmpfs filesystem? Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Wednesday, March 30, 2022 17:07 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, Thank you for you feedback. Can you share the config, when the issue cannot be seen? I am sending the configuration in attachment. I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. The issue is hard to reproduce, the next time the error is seen, I try to run it. But how can I run syslog-ng-ctl stats for the 2nd syslog-ng instance? root@machine:/~# ps -ewfH | grep syslog-ng root 2582 1 0 09:06 ? 00:00:34 /usr/sbin/syslog-ng -F --caps cap_net_bind_service,cap_net_broadcast,cap_net_raw,cap_dac_read_search,cap_chown,cap_fowner=p cap_dac_override,cap_syslog=ep root 4018 1 0 09:07 ? 00:00:00 /usr/sbin/syslog-ng -F --cfgfile=/etc/syslog-ng/mgmt-syslog-ng.conf --pidfile=/var/lib/syslog-ng/mgmt-syslog-ng.pid --persist-file=/var/lib/syslog-ng/mgmt-syslog-ng.persist --control=/var/lib/syslog-ng/mgmt-syslog-ng.ctl syslog-ng-ctl, seems to only show stats for job 2582. Regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 29 de março de 2022 12:48 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working Hi Alex, Using regular disk-buffer vs. using reliable disk-buffer shouldn't cause symptoms like that. It sounds like reliable(yes) would turn on a flow-control-like behaviour, which it doesn't. (And as you said it only affects local sources). The main difference between the two kinds of disk-buffers is, that while reliable disk-buffer write every message to the disk-buffer, a normal disk-buffer has memory-only buffers for performance reasons (and flow-control reasons too). You can still lose logs with a reliable disk-buffer if no flow-control is used: when the disk-buffer has reached it's maximum size and new messages keep arriving, then syslog-ng drops those messages. We have more detailed documentation about disk-buffers in the admin guide, where you can see the structure of disk-buffers: https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.36/administration-guide/61#TOPIC-1768724<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.syslog-ng.com%2Ftechnical-documents%2Fdoc%2Fsyslog-ng-open-source-edition%2F3.36%2Fadministration-guide%2F61%23TOPIC-1768724&data=04%7C01%7CASantos%40infinera.com%7Ca2a160c601fd486c362b08da12ff3fae%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637843184746371785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=SMSL5ZoOMh1Kmw%2B35UUSFGuyNOJVB6NQQstm37ANkoI%3D&reserved=0> Can you share the config, when the issue cannot be seen? I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. Regards, Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Monday, March 28, 2022 13:45 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, "This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources?" Yes. And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? Yes. I could not test the 2 last suggestions that you made. We did however another test, which was to remove the reliable option from d_mgmt_vrf_socket, and it seems the problem is not seen again. Besides from what it is written in the manual, in other which cases/conditions can syslog-ng loose logs? reliable() Type: yes|no Default: no Description: If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash. This solution provides a slower, but reliable disk-buffer option. It is created and initialized at startup and gradually grows as new messages arrive. If set to no, the normal disk-buffer will be used. This provides a faster, but less reliable disk-buffer option. Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 25 de março de 2022 14:44 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Alex, Sorry I haven't answered yet. I'll have a few ideas I would like to try out next week. This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources? And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? If it would be soft flow-control, then the syslog() source would be suspended too. Just a tip: would you switch out the unix-dgram() destination to syslog() destination, please? Maybe that's not possible with the VRF in-place... In the stats output, do you see an increased number of dropped messages? I would still suggest increasing the 4MB disk-buffer. You should make an estimation of how long could the mgmt syslog-ng be down (i.e not receiving from the unix-dgram), what is the average incoming EPS and an average message size, that could give a hint about the required disk-buffer size. Regards, Gabor
Hi, Sorry for not replying earlier. Thanks for the stats output, I've checked it. It looks like messages were dropped, almost 100k messages. dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;dropped;95102 dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;processed;3868284 dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;queued;0 dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;written;3777905 I think what's even more troubling is that the other syslog-ng instance that's sending to Grafana dropped 2,2M messages: dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;dropped;2219335 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;processed;4090485 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;queued;3753 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;suppressed;24682 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;written;1874813 I've tried reproducing the issue, I've placed the disk-buffer files on a tmpfs filesystem (with limited size, so I can test what happens if the disk becomes full). I was unable to reproduce the issue. For me, both the system() and internal() sources were continuing to read new messages, but since the disk-buffer is full, the messages were dropped. I've asked for some help from the team, as I have got other priorities now, so someone else will be looking into this deeper. I think we could narrow down the issue once we have reproduction steps, but I know this is not easy. The biggest mistery for me is still, how could the syslog() UDP!!! source forward messages () when system() and internal() doesn't. The second is how could messages go out from d_mgmg_vrf_socket if the destination is full? Maybe this isn't what is happening, so let's clear out what is really happening when "local sources doesn't work".
From an earlier mail: You're saying that the syslog() source in s_src is receiving message, while the internal() and system() doesn't? [Alexandre Santos] Yes, I think that it is what is happening. Logs from syslog() source are being written to the /var/logs/..., while journald logs are not.
The issue is that there are no messages from system() and internal() in the /var/log/... output files, right? So I can assume that no messages from syslog() source go out on the d_mgmt_vrf_socket() destination when the issue happens? Suggestions: 1. An important thing to emphasize is that your config clearly shows that the disk-buffer is extremely small??, especially without flow-control!!!!! I strongly recommend adjusting it's size from 4MB to t least a 1GB size!!!! (It's a different question that storing a disk-buffer on a tmpfs filesystem (which is RAM based) is not really persistent, but that can be okay as long as the system runs; it's a step better than using only in-memory queue in syslog-ng) 2. You could lower the value of the reconnect timeout from the default 60 seconds to 10 seconds (just a guess), so it will be a lower window of time when messages have to be queued. Regards, Gabor Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com> Sent: Thursday, April 21, 2022 15:22 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, The problem was reproduced again. I was able to get the stats in the error situation: 54146.stats.error.txt I also took the stats after reloading the configuration (which fixes the problem): 54146.stats.after.txt Regarding your question: Just a random question: is /tmp a tmpfs filesystem? Yes it is. Let me know what you found out. Thanks and regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com> Sent: 31 de março de 2022 11:14 To: Alexandre Santos <ASantos@infinera.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: Local sources seem not to be working Thanks for the config! I'll continue experimenting on my ideas You could either configure syslog-ng-ctl stats to talk to a given syslog-ng instance with the --control option pointing to the control-socket e.g. as above /var/lib/syslog-ng/mgmt-syslog-ng.ctl, OR use the syslog-ng-ctl instance under the 2nd syslog-ng installation path. Just a random question: is /tmp a tmpfs filesystem? Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Wednesday, March 30, 2022 17:07 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, Thank you for you feedback. Can you share the config, when the issue cannot be seen? I am sending the configuration in attachment. I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. The issue is hard to reproduce, the next time the error is seen, I try to run it. But how can I run syslog-ng-ctl stats for the 2nd syslog-ng instance? root@machine:/~# ps -ewfH | grep syslog-ng root 2582 1 0 09:06 ? 00:00:34 /usr/sbin/syslog-ng -F --caps cap_net_bind_service,cap_net_broadcast,cap_net_raw,cap_dac_read_search,cap_chown,cap_fowner=p cap_dac_override,cap_syslog=ep root 4018 1 0 09:07 ? 00:00:00 /usr/sbin/syslog-ng -F --cfgfile=/etc/syslog-ng/mgmt-syslog-ng.conf --pidfile=/var/lib/syslog-ng/mgmt-syslog-ng.pid --persist-file=/var/lib/syslog-ng/mgmt-syslog-ng.persist --control=/var/lib/syslog-ng/mgmt-syslog-ng.ctl syslog-ng-ctl, seems to only show stats for job 2582. Regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 29 de março de 2022 12:48 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working Hi Alex, Using regular disk-buffer vs. using reliable disk-buffer shouldn't cause symptoms like that. It sounds like reliable(yes) would turn on a flow-control-like behaviour, which it doesn't. (And as you said it only affects local sources). The main difference between the two kinds of disk-buffers is, that while reliable disk-buffer write every message to the disk-buffer, a normal disk-buffer has memory-only buffers for performance reasons (and flow-control reasons too). You can still lose logs with a reliable disk-buffer if no flow-control is used: when the disk-buffer has reached it's maximum size and new messages keep arriving, then syslog-ng drops those messages. We have more detailed documentation about disk-buffers in the admin guide, where you can see the structure of disk-buffers: https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.36/administration-guide/61#TOPIC-1768724<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.syslog-ng.com%2Ftechnical-documents%2Fdoc%2Fsyslog-ng-open-source-edition%2F3.36%2Fadministration-guide%2F61%23TOPIC-1768724&data=05%7C01%7CGabor.Nagy%40oneidentity.com%7C9c48f3192ab24fd1d7e308da2399febc%7C91c369b51c9e439c989c1867ec606603%7C0%7C1%7C637861441566677311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=QdycH9%2FaR1MF5aqqNmbug5deq9NuGESAqk%2BZ9qd5Kuo%3D&reserved=0> Can you share the config, when the issue cannot be seen? I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. Regards, Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Monday, March 28, 2022 13:45 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, “This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources?” Yes. And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? Yes. I could not test the 2 last suggestions that you made. We did however another test, which was to remove the reliable option from d_mgmt_vrf_socket, and it seems the problem is not seen again. Besides from what it is written in the manual, in other which cases/conditions can syslog-ng loose logs? reliable() Type: yes|no Default: no Description: If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash. This solution provides a slower, but reliable disk-buffer option. It is created and initialized at startup and gradually grows as new messages arrive. If set to no, the normal disk-buffer will be used. This provides a faster, but less reliable disk-buffer option. Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 25 de março de 2022 14:44 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Alex, Sorry I haven't answered yet. I'll have a few ideas I would like to try out next week. This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources? And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? If it would be soft flow-control, then the syslog() source would be suspended too. Just a tip: would you switch out the unix-dgram() destination to syslog() destination, please? Maybe that's not possible with the VRF in-place... In the stats output, do you see an increased number of dropped messages? I would still suggest increasing the 4MB disk-buffer. You should make an estimation of how long could the mgmt syslog-ng be down (i.e not receiving from the unix-dgram), what is the average incoming EPS and an average message size, that could give a hint about the required disk-buffer size. Regards, Gabor
Hi Gabor, Thanks for your support on this. I think what's even more troubling is that the other syslog-ng instance that's sending to Grafana dropped 2,2M messages:
The server (Grafana) is not working. It is pingable, but not receiving logs. This is one of the test conditions when the issue happens.
The issue is that there are no messages from system() and internal() in the /var/log/... output files, right?
Right. So I can assume that no messages from syslog() source go out on the d_mgmt_vrf_socket() destination when the issue happens? Hum, I never check that as matter of fact, I was more focused on the /var/log/... output files.
We will take in consideration your suggestions, to increase the disk buffer as much as possible and reduce the reconnect time. Please let us know if you find something else. Thanks & Regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com> Sent: 28 de abril de 2022 09:34 To: Alexandre Santos <ASantos@infinera.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: Local sources seem not to be working Hi, Sorry for not replying earlier. Thanks for the stats output, I've checked it. It looks like messages were dropped, almost 100k messages. dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;dropped;95102 dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;processed;3868284 dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;queued;0 dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;written;3777905 I think what's even more troubling is that the other syslog-ng instance that's sending to Grafana dropped 2,2M messages: dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;dropped;2219335 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;processed;4090485 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;queued;3753 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;suppressed;24682 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;written;1874813 I've tried reproducing the issue, I've placed the disk-buffer files on a tmpfs filesystem (with limited size, so I can test what happens if the disk becomes full). I was unable to reproduce the issue. For me, both the system() and internal() sources were continuing to read new messages, but since the disk-buffer is full, the messages were dropped. I've asked for some help from the team, as I have got other priorities now, so someone else will be looking into this deeper. I think we could narrow down the issue once we have reproduction steps, but I know this is not easy. The biggest mistery for me is still, how could the syslog() UDP!!! source forward messages () when system() and internal() doesn't. The second is how could messages go out from d_mgmg_vrf_socket if the destination is full? Maybe this isn't what is happening, so let's clear out what is really happening when "local sources doesn't work". From an earlier mail:
You're saying that the syslog() source in s_src is receiving message, while the internal() and system() doesn't? [Alexandre Santos] Yes, I think that it is what is happening. Logs from syslog() source are being written to the /var/logs/..., while journald logs are not.
The issue is that there are no messages from system() and internal() in the /var/log/... output files, right? So I can assume that no messages from syslog() source go out on the d_mgmt_vrf_socket() destination when the issue happens? Suggestions: 1. An important thing to emphasize is that your config clearly shows that the disk-buffer is extremely small, especially without flow-control!!!!! I strongly recommend adjusting it's size from 4MB to t least a 1GB size!!!! (It's a different question that storing a disk-buffer on a tmpfs filesystem (which is RAM based) is not really persistent, but that can be okay as long as the system runs; it's a step better than using only in-memory queue in syslog-ng) 2. You could lower the value of the reconnect timeout from the default 60 seconds to 10 seconds (just a guess), so it will be a lower window of time when messages have to be queued. Regards, Gabor Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Thursday, April 21, 2022 15:22 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, The problem was reproduced again. I was able to get the stats in the error situation: 54146.stats.error.txt I also took the stats after reloading the configuration (which fixes the problem): 54146.stats.after.txt Regarding your question: Just a random question: is /tmp a tmpfs filesystem? Yes it is. Let me know what you found out. Thanks and regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 31 de março de 2022 11:14 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working Thanks for the config! I'll continue experimenting on my ideas You could either configure syslog-ng-ctl stats to talk to a given syslog-ng instance with the --control option pointing to the control-socket e.g. as above /var/lib/syslog-ng/mgmt-syslog-ng.ctl, OR use the syslog-ng-ctl instance under the 2nd syslog-ng installation path. Just a random question: is /tmp a tmpfs filesystem? Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Wednesday, March 30, 2022 17:07 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, Thank you for you feedback. Can you share the config, when the issue cannot be seen? I am sending the configuration in attachment. I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. The issue is hard to reproduce, the next time the error is seen, I try to run it. But how can I run syslog-ng-ctl stats for the 2nd syslog-ng instance? root@machine:/~# ps -ewfH | grep syslog-ng root 2582 1 0 09:06 ? 00:00:34 /usr/sbin/syslog-ng -F --caps cap_net_bind_service,cap_net_broadcast,cap_net_raw,cap_dac_read_search,cap_chown,cap_fowner=p cap_dac_override,cap_syslog=ep root 4018 1 0 09:07 ? 00:00:00 /usr/sbin/syslog-ng -F --cfgfile=/etc/syslog-ng/mgmt-syslog-ng.conf --pidfile=/var/lib/syslog-ng/mgmt-syslog-ng.pid --persist-file=/var/lib/syslog-ng/mgmt-syslog-ng.persist --control=/var/lib/syslog-ng/mgmt-syslog-ng.ctl syslog-ng-ctl, seems to only show stats for job 2582. Regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 29 de março de 2022 12:48 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working Hi Alex, Using regular disk-buffer vs. using reliable disk-buffer shouldn't cause symptoms like that. It sounds like reliable(yes) would turn on a flow-control-like behaviour, which it doesn't. (And as you said it only affects local sources). The main difference between the two kinds of disk-buffers is, that while reliable disk-buffer write every message to the disk-buffer, a normal disk-buffer has memory-only buffers for performance reasons (and flow-control reasons too). You can still lose logs with a reliable disk-buffer if no flow-control is used: when the disk-buffer has reached it's maximum size and new messages keep arriving, then syslog-ng drops those messages. We have more detailed documentation about disk-buffers in the admin guide, where you can see the structure of disk-buffers: https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.36/administration-guide/61#TOPIC-1768724<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.syslog-ng.com%2Ftechnical-documents%2Fdoc%2Fsyslog-ng-open-source-edition%2F3.36%2Fadministration-guide%2F61%23TOPIC-1768724&data=05%7C01%7CASantos%40infinera.com%7C2fcc15893a274509276a08da28f1e447%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C1%7C637867316646973212%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=XdR%2Ff6fc6rHkSjHuYE3tjBf0GjHxg2pOIyTP4GEUfnQ%3D&reserved=0> Can you share the config, when the issue cannot be seen? I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. Regards, Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Monday, March 28, 2022 13:45 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, “This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources?” Yes. And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? Yes. I could not test the 2 last suggestions that you made. We did however another test, which was to remove the reliable option from d_mgmt_vrf_socket, and it seems the problem is not seen again. Besides from what it is written in the manual, in other which cases/conditions can syslog-ng loose logs? reliable() Type: yes|no Default: no Description: If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash. This solution provides a slower, but reliable disk-buffer option. It is created and initialized at startup and gradually grows as new messages arrive. If set to no, the normal disk-buffer will be used. This provides a faster, but less reliable disk-buffer option. Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 25 de março de 2022 14:44 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Alex, Sorry I haven't answered yet. I'll have a few ideas I would like to try out next week. This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources? And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? If it would be soft flow-control, then the syslog() source would be suspended too. Just a tip: would you switch out the unix-dgram() destination to syslog() destination, please? Maybe that's not possible with the VRF in-place... In the stats output, do you see an increased number of dropped messages? I would still suggest increasing the 4MB disk-buffer. You should make an estimation of how long could the mgmt syslog-ng be down (i.e not receiving from the unix-dgram), what is the average incoming EPS and an average message size, that could give a hint about the required disk-buffer size. Regards, Gabor
Hi Gabor, New developments, on this. After another test this time with no remote destinations configured, the issue happened again: In attachment I am sending: * The stats in error condition: 42116.inerror.stats.txt, where ‘src.journald;s_src#0;journal;a;stamp;1651142246’ * The stats in error condition after some time: 42116.inerror.15m.stats.txt where ‘src.journald;s_src#0;journal;a;stamp;1651142246’ * The stats after recovering the system with reload: 42116.after.stats.txt * The syslog-ng configuration: 42116.no.remote.dest.syslog-ng.conf It seems the ‘src.journald;s_src#0;journal;a;stamp;1651142246’did not changed. Does this means that last read timestamp from journal did not changed? Logs from the UDP source are still being written to the /var/log/… files. Please let me know if you find something else. Thanks & Regards, Alex From: syslog-ng <syslog-ng-bounces@lists.balabit.hu> On Behalf Of Alexandre Santos Sent: 28 de abril de 2022 11:18 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] Local sources seem not to be working Hi Gabor, Thanks for your support on this. I think what's even more troubling is that the other syslog-ng instance that's sending to Grafana dropped 2,2M messages:
The server (Grafana) is not working. It is pingable, but not receiving logs. This is one of the test conditions when the issue happens.
The issue is that there are no messages from system() and internal() in the /var/log/... output files, right?
Right. So I can assume that no messages from syslog() source go out on the d_mgmt_vrf_socket() destination when the issue happens? Hum, I never check that as matter of fact, I was more focused on the /var/log/... output files.
We will take in consideration your suggestions, to increase the disk buffer as much as possible and reduce the reconnect time. Please let us know if you find something else. Thanks & Regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 28 de abril de 2022 09:34 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working Hi, Sorry for not replying earlier. Thanks for the stats output, I've checked it. It looks like messages were dropped, almost 100k messages. dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;dropped;95102 dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;processed;3868284 dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;queued;0 dst.unix-stream;d_mgmt_vrf_socket#0;unix-stream,localhost.afunix:/dev/uds_log;a;written;3777905 I think what's even more troubling is that the other syslog-ng instance that's sending to Grafana dropped 2,2M messages: dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;dropped;2219335 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;processed;4090485 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;queued;3753 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;suppressed;24682 dst.syslog;d_grafana_tcp#0;tcp,10.100.71.73:1514;a;written;1874813 I've tried reproducing the issue, I've placed the disk-buffer files on a tmpfs filesystem (with limited size, so I can test what happens if the disk becomes full). I was unable to reproduce the issue. For me, both the system() and internal() sources were continuing to read new messages, but since the disk-buffer is full, the messages were dropped. I've asked for some help from the team, as I have got other priorities now, so someone else will be looking into this deeper. I think we could narrow down the issue once we have reproduction steps, but I know this is not easy. The biggest mistery for me is still, how could the syslog() UDP!!! source forward messages () when system() and internal() doesn't. The second is how could messages go out from d_mgmg_vrf_socket if the destination is full? Maybe this isn't what is happening, so let's clear out what is really happening when "local sources doesn't work". From an earlier mail:
You're saying that the syslog() source in s_src is receiving message, while the internal() and system() doesn't? [Alexandre Santos] Yes, I think that it is what is happening. Logs from syslog() source are being written to the /var/logs/..., while journald logs are not.
The issue is that there are no messages from system() and internal() in the /var/log/... output files, right? So I can assume that no messages from syslog() source go out on the d_mgmt_vrf_socket() destination when the issue happens? Suggestions: 1. An important thing to emphasize is that your config clearly shows that the disk-buffer is extremely small, especially without flow-control!!!!! I strongly recommend adjusting it's size from 4MB to t least a 1GB size!!!! (It's a different question that storing a disk-buffer on a tmpfs filesystem (which is RAM based) is not really persistent, but that can be okay as long as the system runs; it's a step better than using only in-memory queue in syslog-ng) 2. You could lower the value of the reconnect timeout from the default 60 seconds to 10 seconds (just a guess), so it will be a lower window of time when messages have to be queued. Regards, Gabor Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Thursday, April 21, 2022 15:22 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, The problem was reproduced again. I was able to get the stats in the error situation: 54146.stats.error.txt I also took the stats after reloading the configuration (which fixes the problem): 54146.stats.after.txt Regarding your question: Just a random question: is /tmp a tmpfs filesystem? Yes it is. Let me know what you found out. Thanks and regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 31 de março de 2022 11:14 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working Thanks for the config! I'll continue experimenting on my ideas You could either configure syslog-ng-ctl stats to talk to a given syslog-ng instance with the --control option pointing to the control-socket e.g. as above /var/lib/syslog-ng/mgmt-syslog-ng.ctl, OR use the syslog-ng-ctl instance under the 2nd syslog-ng installation path. Just a random question: is /tmp a tmpfs filesystem? Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Wednesday, March 30, 2022 17:07 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, Thank you for you feedback. Can you share the config, when the issue cannot be seen? I am sending the configuration in attachment. I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. The issue is hard to reproduce, the next time the error is seen, I try to run it. But how can I run syslog-ng-ctl stats for the 2nd syslog-ng instance? root@machine:/~# ps -ewfH | grep syslog-ng root 2582 1 0 09:06 ? 00:00:34 /usr/sbin/syslog-ng -F --caps cap_net_bind_service,cap_net_broadcast,cap_net_raw,cap_dac_read_search,cap_chown,cap_fowner=p cap_dac_override,cap_syslog=ep root 4018 1 0 09:07 ? 00:00:00 /usr/sbin/syslog-ng -F --cfgfile=/etc/syslog-ng/mgmt-syslog-ng.conf --pidfile=/var/lib/syslog-ng/mgmt-syslog-ng.pid --persist-file=/var/lib/syslog-ng/mgmt-syslog-ng.persist --control=/var/lib/syslog-ng/mgmt-syslog-ng.ctl syslog-ng-ctl, seems to only show stats for job 2582. Regards, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 29 de março de 2022 12:48 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working Hi Alex, Using regular disk-buffer vs. using reliable disk-buffer shouldn't cause symptoms like that. It sounds like reliable(yes) would turn on a flow-control-like behaviour, which it doesn't. (And as you said it only affects local sources). The main difference between the two kinds of disk-buffers is, that while reliable disk-buffer write every message to the disk-buffer, a normal disk-buffer has memory-only buffers for performance reasons (and flow-control reasons too). You can still lose logs with a reliable disk-buffer if no flow-control is used: when the disk-buffer has reached it's maximum size and new messages keep arriving, then syslog-ng drops those messages. We have more detailed documentation about disk-buffers in the admin guide, where you can see the structure of disk-buffers: https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.36/administration-guide/61#TOPIC-1768724<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.syslog-ng.com%2Ftechnical-documents%2Fdoc%2Fsyslog-ng-open-source-edition%2F3.36%2Fadministration-guide%2F61%23TOPIC-1768724&data=05%7C01%7Casantos%40infinera.com%7C88bdc085cb36483408ef08da29005938%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C1%7C637867378734838251%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=gKf1x5RF7gadIsCanAiPxbSsV8u5MCs0WGjHW29XoEQ%3D&reserved=0> Can you share the config, when the issue cannot be seen? I would still like to see 2 "syslog-ng-ctl stats" outputs when the issue happens. Regards, Gabor ________________________________ From: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>> Sent: Monday, March 28, 2022 13:45 To: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: RE: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe. Hi Gabor, “This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources?” Yes. And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? Yes. I could not test the 2 last suggestions that you made. We did however another test, which was to remove the reliable option from d_mgmt_vrf_socket, and it seems the problem is not seen again. Besides from what it is written in the manual, in other which cases/conditions can syslog-ng loose logs? reliable() Type: yes|no Default: no Description: If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash. This solution provides a slower, but reliable disk-buffer option. It is created and initialized at startup and gradually grows as new messages arrive. If set to no, the normal disk-buffer will be used. This provides a faster, but less reliable disk-buffer option. Thanks in advance, Alex From: Gabor Nagy (gnagy) <Gabor.Nagy@oneidentity.com<mailto:Gabor.Nagy@oneidentity.com>> Sent: 25 de março de 2022 14:44 To: Alexandre Santos <ASantos@infinera.com<mailto:ASantos@infinera.com>>; Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu<mailto:syslog-ng@lists.balabit.hu>> Subject: Re: Local sources seem not to be working CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Alex, Sorry I haven't answered yet. I'll have a few ideas I would like to try out next week. This is strange: the d_localfile destinations (as well as the vrf-socket destination "d_mgmt_vrf_socket") receive messages from the syslog() source, but not from the internal() or system() sources? And the issue vanishes when "d_mgmt_vrf_socket" destination is removed? If it would be soft flow-control, then the syslog() source would be suspended too. Just a tip: would you switch out the unix-dgram() destination to syslog() destination, please? Maybe that's not possible with the VRF in-place... In the stats output, do you see an increased number of dropped messages? I would still suggest increasing the 4MB disk-buffer. You should make an estimation of how long could the mgmt syslog-ng be down (i.e not receiving from the unix-dgram), what is the average incoming EPS and an average message size, that could give a hint about the required disk-buffer size. Regards, Gabor
participants (2)
-
Alexandre Santos
-
Gabor Nagy (gnagy)