disk-buffer in elasticsearch2 destination loses messages if docker container is killed

newer
Undesirable behavior from Cisco...

Jose Angel Santiago

13 Sep 2018 13 Sep '18

3:50 p.m.

Hi, I'm running syslog-ng (with an elasticsearch2 destination configured) within a docker container, and I'm trying to avoid loss of messages if I kill the docker container and I start it again. This is my scenary: - A service which produces 20 lines of log per second - A sislog-ng instance reading from a wildcard-file source (but actually it only reads logs from the above service, let's call it syslog-agent), which sends all logs to another syslog-ng instance (the one running in a docker container, let's call it syslog-relay) though a network destination. - The syslog-relay sends messages to an elasticsearch instance, with following configuration: options { chain-hostnames(no); use-dns(no); keep-hostname(yes); owner("syslog-ng"); group("stratio"); perm(0640); time-reap(30); mark-freq(10); stats-freq(0); bad-hostname("^gconfd$"); flush-lines(100); log-fifo-size(1000); }; *destination d_elastic_default_0 { elasticsearch2( cluster("myelastic") cluster-url("https://myelastic.logs:9200 <https://myelastic.logs:9200>") client_mode("https") index("default") type("log") flush-limit(1) disk-buffer( mem-buf-size(16M) disk-buf-size(16M) reliable(yes) dir("/syslog-ng/log") ) http-auth-type("clientcert") java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") java-keystore-password("XXXXXX") java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") java-truststore-password("XXXXXXXXXX") );};* - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from the host where the docker container is running, so when I kill the docker container, the buffer file is not lost. - I've set flush-limit to 1 because I thought that I may lost 1 message only as much. This architecture is working fine (flush-limit=1 makes very slow, but for this test is ok), but if I kill the syslog-relay docker container, wait 5 to 10 seconds and start it again from scratch, I can see that several hundreds of logs are missing in elasticsearch. I check it by stopping the logger service and letting syslog-ng agent & relay to finish the process enqueued messages. I can see in the syslog-agent stats that all logs messages have been processed, so it seems the problem is on the syslog-relay. Is this behaviour expected? If so, how can I protect against loss of messages in case of a syslog-relay docker container unexpected kill? Thanks in advance. -- | Jose Angel Santiago [image: Logo_signature2.png] <http://www.stratio.com/> Vía de las dos Castillas, 33, Ática 4, 3ª Planta 28224 Pozuelo de Alarcón, Madrid, Spain +34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

Attachments:

attachment.html (text/html — 7.0 KB)

Show replies by date

Péter, Kókai

13 Sep 13 Sep

4:11 p.m.

Hello, My initial guess would be that you do not store outside of the container the persist file. There is a so called persist file, which is important to keep as well That file you can find under $PREFIX/var/syslog-ng.persist -- Kokan On Thu, Sep 13, 2018 at 3:50 PM Jose Angel Santiago <jasantiago@stratio.com> wrote:

...

Hi,

I'm running syslog-ng (with an elasticsearch2 destination configured) within a docker container, and I'm trying to avoid loss of messages if I kill the docker container and I start it again.

This is my scenary:

- A service which produces 20 lines of log per second - A sislog-ng instance reading from a wildcard-file source (but actually it only reads logs from the above service, let's call it syslog-agent), which sends all logs to another syslog-ng instance (the one running in a docker container, let's call it syslog-relay) though a network destination. - The syslog-relay sends messages to an elasticsearch instance, with following configuration:

options { chain-hostnames(no); use-dns(no); keep-hostname(yes); owner("syslog-ng"); group("stratio"); perm(0640); time-reap(30); mark-freq(10); stats-freq(0); bad-hostname("^gconfd$"); flush-lines(100); log-fifo-size(1000); };

*destination d_elastic_default_0 { elasticsearch2( cluster("myelastic") cluster-url("https://myelastic.logs:9200 <https://myelastic.logs:9200>") client_mode("https") index("default") type("log") flush-limit(1) disk-buffer( mem-buf-size(16M) disk-buf-size(16M) reliable(yes) dir("/syslog-ng/log") ) http-auth-type("clientcert") java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") java-keystore-password("XXXXXX") java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") java-truststore-password("XXXXXXXXXX") );};*

- The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from the host where the docker container is running, so when I kill the docker container, the buffer file is not lost. - I've set flush-limit to 1 because I thought that I may lost 1 message only as much.

This architecture is working fine (flush-limit=1 makes very slow, but for this test is ok), but if I kill the syslog-relay docker container, wait 5 to 10 seconds and start it again from scratch, I can see that several hundreds of logs are missing in elasticsearch. I check it by stopping the logger service and letting syslog-ng agent & relay to finish the process enqueued messages.

I can see in the syslog-agent stats that all logs messages have been processed, so it seems the problem is on the syslog-relay.

Is this behaviour expected? If so, how can I protect against loss of messages in case of a syslog-relay docker container unexpected kill?

Thanks in advance.

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Jose Angel Santiago

4:15 p.m.

I'm afraid that's not the problem. I use the same dir (/syslog-ng/log) to save the persist file. This is the content of the /tmp/buffer dir in the host, you can see the buffer file along with the persist file. -rw-------. 1 root root 556996 sep 13 12:09 syslog-ng-00000.rqf srwxr-xr-x. 1 root root 0 sep 13 12:07 syslog-ng.ctl -rw-------. 1 root root 16384 sep 13 12:07 syslog-ng.persist -rw-r--r--. 1 root root 4 sep 13 12:07 syslog-ng.pid Regards. 2018-09-13 16:11 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com>:

...

Hello,

My initial guess would be that you do not store outside of the container the persist file. There is a so called persist file, which is important to keep as well That file you can find under $PREFIX/var/syslog-ng.persist

-- Kokan

On Thu, Sep 13, 2018 at 3:50 PM Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I'm running syslog-ng (with an elasticsearch2 destination configured) within a docker container, and I'm trying to avoid loss of messages if I kill the docker container and I start it again.

This is my scenary:

- A service which produces 20 lines of log per second - A sislog-ng instance reading from a wildcard-file source (but actually it only reads logs from the above service, let's call it syslog-agent), which sends all logs to another syslog-ng instance (the one running in a docker container, let's call it syslog-relay) though a network destination. - The syslog-relay sends messages to an elasticsearch instance, with following configuration:

options { chain-hostnames(no); use-dns(no); keep-hostname(yes); owner("syslog-ng"); group("stratio"); perm(0640); time-reap(30); mark-freq(10); stats-freq(0); bad-hostname("^gconfd$"); flush-lines(100); log-fifo-size(1000); };

*destination d_elastic_default_0 { elasticsearch2( cluster("myelastic") cluster-url("https://myelastic.logs:9200 <https://myelastic.logs:9200>") client_mode("https") index("default") type("log") flush-limit(1) disk-buffer( mem-buf-size(16M) disk-buf-size(16M) reliable(yes) dir("/syslog-ng/log") ) http-auth-type("clientcert") java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") java-keystore-password("XXXXXX") java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") java-truststore-password("XXXXXXXXXX") );};*

- The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from the host where the docker container is running, so when I kill the docker container, the buffer file is not lost. - I've set flush-limit to 1 because I thought that I may lost 1 message only as much.

This architecture is working fine (flush-limit=1 makes very slow, but for this test is ok), but if I kill the syslog-relay docker container, wait 5 to 10 seconds and start it again from scratch, I can see that several hundreds of logs are missing in elasticsearch. I check it by stopping the logger service and letting syslog-ng agent & relay to finish the process enqueued messages.

I can see in the syslog-agent stats that all logs messages have been processed, so it seems the problem is on the syslog-relay.

Is this behaviour expected? If so, how can I protect against loss of messages in case of a syslog-relay docker container unexpected kill?

Thanks in advance.

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD> ____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

-- | Jose Angel Santiago [image: Logo_signature2.png] <http://www.stratio.com/> Vía de las dos Castillas, 33, Ática 4, 3ª Planta 28224 Pozuelo de Alarcón, Madrid, Spain +34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

Budai, László

4:23 p.m.

Hi, one problem could be if the flush-limit would be greater than 1... in that case syslog-ng would use a HttpBulkMessageProcessor. In this case syslog-ng pass the message to the HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so the message is removed from the diskbuffer), and if the dockerimage is killed, all the messages stored in the HttpBulkMessageProcessor are lost. But in your case syslog-ng should use the HttpSingleMessageProcessor... which means that the messages are sent one-by-one... Could you check the diskbuffer with the dqtool? L. On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago <jasantiago@stratio.com

...

wrote:

...

Hi,

I'm running syslog-ng (with an elasticsearch2 destination configured) within a docker container, and I'm trying to avoid loss of messages if I kill the docker container and I start it again.

This is my scenary:

- A service which produces 20 lines of log per second - A sislog-ng instance reading from a wildcard-file source (but actually it only reads logs from the above service, let's call it syslog-agent), which sends all logs to another syslog-ng instance (the one running in a docker container, let's call it syslog-relay) though a network destination. - The syslog-relay sends messages to an elasticsearch instance, with following configuration:

options { chain-hostnames(no); use-dns(no); keep-hostname(yes); owner("syslog-ng"); group("stratio"); perm(0640); time-reap(30); mark-freq(10); stats-freq(0); bad-hostname("^gconfd$"); flush-lines(100); log-fifo-size(1000); };

*destination d_elastic_default_0 { elasticsearch2( cluster("myelastic") cluster-url("https://myelastic.logs:9200 <https://myelastic.logs:9200>") client_mode("https") index("default") type("log") flush-limit(1) disk-buffer( mem-buf-size(16M) disk-buf-size(16M) reliable(yes) dir("/syslog-ng/log") ) http-auth-type("clientcert") java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") java-keystore-password("XXXXXX") java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") java-truststore-password("XXXXXXXXXX") );};*

- The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from the host where the docker container is running, so when I kill the docker container, the buffer file is not lost. - I've set flush-limit to 1 because I thought that I may lost 1 message only as much.

This architecture is working fine (flush-limit=1 makes very slow, but for this test is ok), but if I kill the syslog-relay docker container, wait 5 to 10 seconds and start it again from scratch, I can see that several hundreds of logs are missing in elasticsearch. I check it by stopping the logger service and letting syslog-ng agent & relay to finish the process enqueued messages.

I can see in the syslog-agent stats that all logs messages have been processed, so it seems the problem is on the syslog-relay.

Is this behaviour expected? If so, how can I protect against loss of messages in case of a syslog-relay docker container unexpected kill?

Thanks in advance.

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Jose Angel Santiago

4:43 p.m.

Hi, I guess I know what is happening, when I start from scratch the docker container, even I provide a persist file and a buffer file within the mapped volume, syslog-ng recreates them so all messages in buffer file which were not processed by the relay are lost. Is there any way to tell syslog-ng to use an already existing .persist file so it doesn't recreate the .rqf file? Regards. 2018-09-13 16:23 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...

Hi,

one problem could be if the flush-limit would be greater than 1... in that case syslog-ng would use a HttpBulkMessageProcessor. In this case syslog-ng pass the message to the HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so the message is removed from the diskbuffer), and if the dockerimage is killed, all the messages stored in the HttpBulkMessageProcessor are lost. But in your case syslog-ng should use the HttpSingleMessageProcessor... which means that the messages are sent one-by-one... Could you check the diskbuffer with the dqtool?

L.

On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I'm running syslog-ng (with an elasticsearch2 destination configured) within a docker container, and I'm trying to avoid loss of messages if I kill the docker container and I start it again.

This is my scenary:

- A service which produces 20 lines of log per second - A sislog-ng instance reading from a wildcard-file source (but actually it only reads logs from the above service, let's call it syslog-agent), which sends all logs to another syslog-ng instance (the one running in a docker container, let's call it syslog-relay) though a network destination. - The syslog-relay sends messages to an elasticsearch instance, with following configuration:

options { chain-hostnames(no); use-dns(no); keep-hostname(yes); owner("syslog-ng"); group("stratio"); perm(0640); time-reap(30); mark-freq(10); stats-freq(0); bad-hostname("^gconfd$"); flush-lines(100); log-fifo-size(1000); };

*destination d_elastic_default_0 { elasticsearch2( cluster("myelastic") cluster-url("https://myelastic.logs:9200 <https://myelastic.logs:9200>") client_mode("https") index("default") type("log") flush-limit(1) disk-buffer( mem-buf-size(16M) disk-buf-size(16M) reliable(yes) dir("/syslog-ng/log") ) http-auth-type("clientcert") java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") java-keystore-password("XXXXXX") java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") java-truststore-password("XXXXXXXXXX") );};*

- The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from the host where the docker container is running, so when I kill the docker container, the buffer file is not lost. - I've set flush-limit to 1 because I thought that I may lost 1 message only as much.

This architecture is working fine (flush-limit=1 makes very slow, but for this test is ok), but if I kill the syslog-relay docker container, wait 5 to 10 seconds and start it again from scratch, I can see that several hundreds of logs are missing in elasticsearch. I check it by stopping the logger service and letting syslog-ng agent & relay to finish the process enqueued messages.

I can see in the syslog-agent stats that all logs messages have been processed, so it seems the problem is on the syslog-relay.

Is this behaviour expected? If so, how can I protect against loss of messages in case of a syslog-relay docker container unexpected kill?

Thanks in advance.

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Péter, Kókai

6:07 p.m.

Hello, It would not make sense to replace the persist file after restart, so it is not something that syslog-ng does. Only if that file is corrupted, in that case at startup there should be a log about it, have you checked the syslog-ng logs ? (it would be better to enable debug and/or verbose logs, and if possible share it with us.) Could you reproduce the same behavior without docker (if possible) ? Best Regards, Peter Kokai On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago <jasantiago@stratio.com> wrote:

...

Hi,

I guess I know what is happening, when I start from scratch the docker container, even I provide a persist file and a buffer file within the mapped volume, syslog-ng recreates them so all messages in buffer file which were not processed by the relay are lost.

Is there any way to tell syslog-ng to use an already existing .persist file so it doesn't recreate the .rqf file?

Regards.

2018-09-13 16:23 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

one problem could be if the flush-limit would be greater than 1... in that case syslog-ng would use a HttpBulkMessageProcessor. In this case syslog-ng pass the message to the HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so the message is removed from the diskbuffer), and if the dockerimage is killed, all the messages stored in the HttpBulkMessageProcessor are lost. But in your case syslog-ng should use the HttpSingleMessageProcessor... which means that the messages are sent one-by-one... Could you check the diskbuffer with the dqtool?

L.

On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I'm running syslog-ng (with an elasticsearch2 destination configured) within a docker container, and I'm trying to avoid loss of messages if I kill the docker container and I start it again.

This is my scenary:

- A service which produces 20 lines of log per second - A sislog-ng instance reading from a wildcard-file source (but actually it only reads logs from the above service, let's call it syslog-agent), which sends all logs to another syslog-ng instance (the one running in a docker container, let's call it syslog-relay) though a network destination. - The syslog-relay sends messages to an elasticsearch instance, with following configuration:

options { chain-hostnames(no); use-dns(no); keep-hostname(yes); owner("syslog-ng"); group("stratio"); perm(0640); time-reap(30); mark-freq(10); stats-freq(0); bad-hostname("^gconfd$"); flush-lines(100); log-fifo-size(1000); };

*destination d_elastic_default_0 { elasticsearch2( cluster("myelastic") cluster-url("https://myelastic.logs:9200 <https://myelastic.logs:9200>") client_mode("https") index("default") type("log") flush-limit(1) disk-buffer( mem-buf-size(16M) disk-buf-size(16M) reliable(yes) dir("/syslog-ng/log") ) http-auth-type("clientcert") java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") java-keystore-password("XXXXXX") java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") java-truststore-password("XXXXXXXXXX") );};*

- The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from the host where the docker container is running, so when I kill the docker container, the buffer file is not lost. - I've set flush-limit to 1 because I thought that I may lost 1 message only as much.

This architecture is working fine (flush-limit=1 makes very slow, but for this test is ok), but if I kill the syslog-relay docker container, wait 5 to 10 seconds and start it again from scratch, I can see that several hundreds of logs are missing in elasticsearch. I check it by stopping the logger service and letting syslog-ng agent & relay to finish the process enqueued messages.

I can see in the syslog-agent stats that all logs messages have been processed, so it seems the problem is on the syslog-relay.

Is this behaviour expected? If so, how can I protect against loss of messages in case of a syslog-relay docker container unexpected kill?

Thanks in advance.

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Jose Angel Santiago

14 Sep 14 Sep

2:05 p.m.

Hi, I've got more accurate information about where are my lost messages. Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've checked that lost messages are the ones sent by the syslog-agent when the syslog-relay docker container is being killed. I can see those messages on syslog-agent log (I've got both agent & relay in debug mode) with its corresponding "Outgoing message" line, but those messages never reach the relay. Could it be that the relay docker container still returns ACK to the agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng process within the container is being stopped? I'm about to test again using tcpdump to confirm this theory, BTW, disk-buffers works ok, sometimes I get some duplicated messages when restarting the relay but that's not a problem for me. Forget about my .persist file re-creation theory, it doesn't happen. Regards. 2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com>:

...

Hello,

It would not make sense to replace the persist file after restart, so it is not something that syslog-ng does. Only if that file is corrupted, in that case at startup there should be a log about it, have you checked the syslog-ng logs ? (it would be better to enable debug and/or verbose logs, and if possible share it with us.)

Could you reproduce the same behavior without docker (if possible) ?

Best Regards, Peter Kokai

On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I guess I know what is happening, when I start from scratch the docker container, even I provide a persist file and a buffer file within the mapped volume, syslog-ng recreates them so all messages in buffer file which were not processed by the relay are lost.

Is there any way to tell syslog-ng to use an already existing .persist file so it doesn't recreate the .rqf file?

Regards.

2018-09-13 16:23 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

one problem could be if the flush-limit would be greater than 1... in that case syslog-ng would use a HttpBulkMessageProcessor. In this case syslog-ng pass the message to the HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so the message is removed from the diskbuffer), and if the dockerimage is killed, all the messages stored in the HttpBulkMessageProcessor are lost. But in your case syslog-ng should use the HttpSingleMessageProcessor... which means that the messages are sent one-by-one... Could you check the diskbuffer with the dqtool?

L.

On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I'm running syslog-ng (with an elasticsearch2 destination configured) within a docker container, and I'm trying to avoid loss of messages if I kill the docker container and I start it again.

This is my scenary:

- A service which produces 20 lines of log per second - A sislog-ng instance reading from a wildcard-file source (but actually it only reads logs from the above service, let's call it syslog-agent), which sends all logs to another syslog-ng instance (the one running in a docker container, let's call it syslog-relay) though a network destination. - The syslog-relay sends messages to an elasticsearch instance, with following configuration:

options { chain-hostnames(no); use-dns(no); keep-hostname(yes); owner("syslog-ng"); group("stratio"); perm(0640); time-reap(30); mark-freq(10); stats-freq(0); bad-hostname("^gconfd$"); flush-lines(100); log-fifo-size(1000); };

*destination d_elastic_default_0 { elasticsearch2( cluster("myelastic") cluster-url("https://myelastic.logs:9200 <https://myelastic.logs:9200>") client_mode("https") index("default") type("log") flush-limit(1) disk-buffer( mem-buf-size(16M) disk-buf-size(16M) reliable(yes) dir("/syslog-ng/log") ) http-auth-type("clientcert") java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") java-keystore-password("XXXXXX") java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") java-truststore-password("XXXXXXXXXX") );};*

- The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from the host where the docker container is running, so when I kill the docker container, the buffer file is not lost. - I've set flush-limit to 1 because I thought that I may lost 1 message only as much.

This architecture is working fine (flush-limit=1 makes very slow, but for this test is ok), but if I kill the syslog-relay docker container, wait 5 to 10 seconds and start it again from scratch, I can see that several hundreds of logs are missing in elasticsearch. I check it by stopping the logger service and letting syslog-ng agent & relay to finish the process enqueued messages.

I can see in the syslog-agent stats that all logs messages have been processed, so it seems the problem is on the syslog-relay.

Is this behaviour expected? If so, how can I protect against loss of messages in case of a syslog-relay docker container unexpected kill?

Thanks in advance.

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD> ____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Jose Angel Santiago

17 Sep 17 Sep

2:04 p.m.

Hi, After repeating the test without TLS between syslog-agent & syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, in debug mode and with two tcpdumps running, these are the conclussions: - Syslog-agent sends every single message (checked with tcpdump file and syslog-agent log file) - Syslog-relay returns ACK for every single message, but sometimes the package contains more than one message (checked with tcpdump file) - Checking the syslog-relay log after killing & starting the docker container, I found the trace 'Reliable disk-buffer state loaded; filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' and the queued messages (from msg 236 to msg 334 are processed). The next message processed is the nº 440. and it comes from the active network destination again. - Once all logs have been processed, in elasticsearch (syslog-relay destination) I can find messages from 1 to 334, and messages from 440 to 2229. How is it possible that syslog-relay returns ACK for messages 335 to 439 to syslog-agent, but they are not in the queue, nor in the .rqf file? Where did these messages go? I know they are sent before starting to fill syslog-agent disk-buffer file (checked with rqf file from syslog-agent when I kill the syslog-relay container). It seems that there's some kind of race condition that makes those messages are not processed/queued while syslog-relay is being killed. Mi procedure consists of: . Start tcpdump on both syslog-agent & syslog-relay hosts - Start logger process - Once logs are being inserted in elasticsearch, I kill the syslog-relay docker container, wait about 10 seconds, and run a new container (using a mapped volume where .rqf file and .persist file are) - Once logs are being inserted in elasticsearch again, I stop the logger process. - I wait until no log is left to be processed Would you test anything else? I'm running out of ideas. PD: Same test with logger process writing 5 msgs/sec ratio produces from 0 to 5 lost messages. Regards. 2018-09-14 14:05 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:

...

Hi,

I've got more accurate information about where are my lost messages.

Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've checked that lost messages are the ones sent by the syslog-agent when the syslog-relay docker container is being killed. I can see those messages on syslog-agent log (I've got both agent & relay in debug mode) with its corresponding "Outgoing message" line, but those messages never reach the relay.

Could it be that the relay docker container still returns ACK to the agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng process within the container is being stopped? I'm about to test again using tcpdump to confirm this theory,

BTW, disk-buffers works ok, sometimes I get some duplicated messages when restarting the relay but that's not a problem for me. Forget about my .persist file re-creation theory, it doesn't happen.

Regards.

2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com>:

...
Hello,

It would not make sense to replace the persist file after restart, so it is not something that syslog-ng does. Only if that file is corrupted, in that case at startup there should be a log about it, have you checked the syslog-ng logs ? (it would be better to enable debug and/or verbose logs, and if possible share it with us.)

Could you reproduce the same behavior without docker (if possible) ?

Best Regards, Peter Kokai

On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I guess I know what is happening, when I start from scratch the docker container, even I provide a persist file and a buffer file within the mapped volume, syslog-ng recreates them so all messages in buffer file which were not processed by the relay are lost.

Is there any way to tell syslog-ng to use an already existing .persist file so it doesn't recreate the .rqf file?

Regards.

2018-09-13 16:23 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

one problem could be if the flush-limit would be greater than 1... in that case syslog-ng would use a HttpBulkMessageProcessor. In this case syslog-ng pass the message to the HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so the message is removed from the diskbuffer), and if the dockerimage is killed, all the messages stored in the HttpBulkMessageProcessor are lost. But in your case syslog-ng should use the HttpSingleMessageProcessor... which means that the messages are sent one-by-one... Could you check the diskbuffer with the dqtool?

L.

On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I'm running syslog-ng (with an elasticsearch2 destination configured) within a docker container, and I'm trying to avoid loss of messages if I kill the docker container and I start it again.

This is my scenary:

- A service which produces 20 lines of log per second - A sislog-ng instance reading from a wildcard-file source (but actually it only reads logs from the above service, let's call it syslog-agent), which sends all logs to another syslog-ng instance (the one running in a docker container, let's call it syslog-relay) though a network destination. - The syslog-relay sends messages to an elasticsearch instance, with following configuration:

options { chain-hostnames(no); use-dns(no); keep-hostname(yes); owner("syslog-ng"); group("stratio"); perm(0640); time-reap(30); mark-freq(10); stats-freq(0); bad-hostname("^gconfd$"); flush-lines(100); log-fifo-size(1000); };

*destination d_elastic_default_0 { elasticsearch2( cluster("myelastic") cluster-url("https://myelastic.logs:9200 <https://myelastic.logs:9200>") client_mode("https") index("default") type("log") flush-limit(1) disk-buffer( mem-buf-size(16M) disk-buf-size(16M) reliable(yes) dir("/syslog-ng/log") ) http-auth-type("clientcert") java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") java-keystore-password("XXXXXX") java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") java-truststore-password("XXXXXXXXXX") );};*

- The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from the host where the docker container is running, so when I kill the docker container, the buffer file is not lost. - I've set flush-limit to 1 because I thought that I may lost 1 message only as much.

This architecture is working fine (flush-limit=1 makes very slow, but for this test is ok), but if I kill the syslog-relay docker container, wait 5 to 10 seconds and start it again from scratch, I can see that several hundreds of logs are missing in elasticsearch. I check it by stopping the logger service and letting syslog-ng agent & relay to finish the process enqueued messages.

I can see in the syslog-agent stats that all logs messages have been processed, so it seems the problem is on the syslog-relay.

Is this behaviour expected? If so, how can I protect against loss of messages in case of a syslog-relay docker container unexpected kill?

Thanks in advance.

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD> ____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

Budai, László

2:31 p.m.

Hi, I'll try to reproduce this issue in my test environment. What syslog-ng version do you have? L. On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago <jasantiago@stratio.com

...

wrote:

...

Hi,

After repeating the test without TLS between syslog-agent & syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, in debug mode and with two tcpdumps running, these are the conclussions:

- Syslog-agent sends every single message (checked with tcpdump file and syslog-agent log file) - Syslog-relay returns ACK for every single message, but sometimes the package contains more than one message (checked with tcpdump file) - Checking the syslog-relay log after killing & starting the docker container, I found the trace 'Reliable disk-buffer state loaded; filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' and the queued messages (from msg 236 to msg 334 are processed). The next message processed is the nº 440. and it comes from the active network destination again. - Once all logs have been processed, in elasticsearch (syslog-relay destination) I can find messages from 1 to 334, and messages from 440 to 2229.

How is it possible that syslog-relay returns ACK for messages 335 to 439 to syslog-agent, but they are not in the queue, nor in the .rqf file? Where did these messages go? I know they are sent before starting to fill syslog-agent disk-buffer file (checked with rqf file from syslog-agent when I kill the syslog-relay container). It seems that there's some kind of race condition that makes those messages are not processed/queued while syslog-relay is being killed.

Mi procedure consists of:

. Start tcpdump on both syslog-agent & syslog-relay hosts - Start logger process - Once logs are being inserted in elasticsearch, I kill the syslog-relay docker container, wait about 10 seconds, and run a new container (using a mapped volume where .rqf file and .persist file are) - Once logs are being inserted in elasticsearch again, I stop the logger process. - I wait until no log is left to be processed

Would you test anything else? I'm running out of ideas.

PD: Same test with logger process writing 5 msgs/sec ratio produces from 0 to 5 lost messages.

Regards.

2018-09-14 14:05 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:

...
Hi,

I've got more accurate information about where are my lost messages.

Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've checked that lost messages are the ones sent by the syslog-agent when the syslog-relay docker container is being killed. I can see those messages on syslog-agent log (I've got both agent & relay in debug mode) with its corresponding "Outgoing message" line, but those messages never reach the relay.

Could it be that the relay docker container still returns ACK to the agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng process within the container is being stopped? I'm about to test again using tcpdump to confirm this theory,

BTW, disk-buffers works ok, sometimes I get some duplicated messages when restarting the relay but that's not a problem for me. Forget about my .persist file re-creation theory, it doesn't happen.

Regards.

2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com>:

...
Hello,

It would not make sense to replace the persist file after restart, so it is not something that syslog-ng does. Only if that file is corrupted, in that case at startup there should be a log about it, have you checked the syslog-ng logs ? (it would be better to enable debug and/or verbose logs, and if possible share it with us.)

Could you reproduce the same behavior without docker (if possible) ?

Best Regards, Peter Kokai

On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I guess I know what is happening, when I start from scratch the docker container, even I provide a persist file and a buffer file within the mapped volume, syslog-ng recreates them so all messages in buffer file which were not processed by the relay are lost.

Is there any way to tell syslog-ng to use an already existing .persist file so it doesn't recreate the .rqf file?

Regards.

2018-09-13 16:23 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com> :

...
Hi,

one problem could be if the flush-limit would be greater than 1... in that case syslog-ng would use a HttpBulkMessageProcessor. In this case syslog-ng pass the message to the HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so the message is removed from the diskbuffer), and if the dockerimage is killed, all the messages stored in the HttpBulkMessageProcessor are lost. But in your case syslog-ng should use the HttpSingleMessageProcessor... which means that the messages are sent one-by-one... Could you check the diskbuffer with the dqtool?

L.

On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I'm running syslog-ng (with an elasticsearch2 destination configured) within a docker container, and I'm trying to avoid loss of messages if I kill the docker container and I start it again.

This is my scenary:

- A service which produces 20 lines of log per second - A sislog-ng instance reading from a wildcard-file source (but actually it only reads logs from the above service, let's call it syslog-agent), which sends all logs to another syslog-ng instance (the one running in a docker container, let's call it syslog-relay) though a network destination. - The syslog-relay sends messages to an elasticsearch instance, with following configuration:

options { chain-hostnames(no); use-dns(no); keep-hostname(yes); owner("syslog-ng"); group("stratio"); perm(0640); time-reap(30); mark-freq(10); stats-freq(0); bad-hostname("^gconfd$"); flush-lines(100); log-fifo-size(1000); };

*destination d_elastic_default_0 { elasticsearch2( cluster("myelastic") cluster-url("https://myelastic.logs:9200 <https://myelastic.logs:9200>") client_mode("https") index("default") type("log") flush-limit(1) disk-buffer( mem-buf-size(16M) disk-buf-size(16M) reliable(yes) dir("/syslog-ng/log") ) http-auth-type("clientcert") java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") java-keystore-password("XXXXXX") java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") java-truststore-password("XXXXXXXXXX") );};*

- The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from the host where the docker container is running, so when I kill the docker container, the buffer file is not lost. - I've set flush-limit to 1 because I thought that I may lost 1 message only as much.

This architecture is working fine (flush-limit=1 makes very slow, but for this test is ok), but if I kill the syslog-relay docker container, wait 5 to 10 seconds and start it again from scratch, I can see that several hundreds of logs are missing in elasticsearch. I check it by stopping the logger service and letting syslog-ng agent & relay to finish the process enqueued messages.

I can see in the syslog-agent stats that all logs messages have been processed, so it seems the problem is on the syslog-relay.

Is this behaviour expected? If so, how can I protect against loss of messages in case of a syslog-relay docker container unexpected kill?

Thanks in advance.

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD> ____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Jose Angel Santiago

2:33 p.m.

3.17.2 Thanks in advance. 2018-09-17 14:31 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...

Hi,

I'll try to reproduce this issue in my test environment. What syslog-ng version do you have?

L.

On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

After repeating the test without TLS between syslog-agent & syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, in debug mode and with two tcpdumps running, these are the conclussions:

- Syslog-agent sends every single message (checked with tcpdump file and syslog-agent log file) - Syslog-relay returns ACK for every single message, but sometimes the package contains more than one message (checked with tcpdump file) - Checking the syslog-relay log after killing & starting the docker container, I found the trace 'Reliable disk-buffer state loaded; filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' and the queued messages (from msg 236 to msg 334 are processed). The next message processed is the nº 440. and it comes from the active network destination again. - Once all logs have been processed, in elasticsearch (syslog-relay destination) I can find messages from 1 to 334, and messages from 440 to 2229.

How is it possible that syslog-relay returns ACK for messages 335 to 439 to syslog-agent, but they are not in the queue, nor in the .rqf file? Where did these messages go? I know they are sent before starting to fill syslog-agent disk-buffer file (checked with rqf file from syslog-agent when I kill the syslog-relay container). It seems that there's some kind of race condition that makes those messages are not processed/queued while syslog-relay is being killed.

Mi procedure consists of:

. Start tcpdump on both syslog-agent & syslog-relay hosts - Start logger process - Once logs are being inserted in elasticsearch, I kill the syslog-relay docker container, wait about 10 seconds, and run a new container (using a mapped volume where .rqf file and .persist file are) - Once logs are being inserted in elasticsearch again, I stop the logger process. - I wait until no log is left to be processed

Would you test anything else? I'm running out of ideas.

PD: Same test with logger process writing 5 msgs/sec ratio produces from 0 to 5 lost messages.

Regards.

2018-09-14 14:05 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:

...
Hi,

I've got more accurate information about where are my lost messages.

Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've checked that lost messages are the ones sent by the syslog-agent when the syslog-relay docker container is being killed. I can see those messages on syslog-agent log (I've got both agent & relay in debug mode) with its corresponding "Outgoing message" line, but those messages never reach the relay.

Could it be that the relay docker container still returns ACK to the agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng process within the container is being stopped? I'm about to test again using tcpdump to confirm this theory,

BTW, disk-buffers works ok, sometimes I get some duplicated messages when restarting the relay but that's not a problem for me. Forget about my .persist file re-creation theory, it doesn't happen.

Regards.

2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com>:

...
Hello,

It would not make sense to replace the persist file after restart, so it is not something that syslog-ng does. Only if that file is corrupted, in that case at startup there should be a log about it, have you checked the syslog-ng logs ? (it would be better to enable debug and/or verbose logs, and if possible share it with us.)

Could you reproduce the same behavior without docker (if possible) ?

Best Regards, Peter Kokai

On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I guess I know what is happening, when I start from scratch the docker container, even I provide a persist file and a buffer file within the mapped volume, syslog-ng recreates them so all messages in buffer file which were not processed by the relay are lost.

Is there any way to tell syslog-ng to use an already existing .persist file so it doesn't recreate the .rqf file?

Regards.

2018-09-13 16:23 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com

...
:

...
Hi,

one problem could be if the flush-limit would be greater than 1... in that case syslog-ng would use a HttpBulkMessageProcessor. In this case syslog-ng pass the message to the HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so the message is removed from the diskbuffer), and if the dockerimage is killed, all the messages stored in the HttpBulkMessageProcessor are lost. But in your case syslog-ng should use the HttpSingleMessageProcessor... which means that the messages are sent one-by-one... Could you check the diskbuffer with the dqtool?

L.

On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

> Hi, > > I'm running syslog-ng (with an elasticsearch2 destination > configured) within a docker container, and I'm trying to avoid loss of > messages if I kill the docker container and I start it again. > > This is my scenary: > > - A service which produces 20 lines of log per second > - A sislog-ng instance reading from a wildcard-file source (but > actually it only reads logs from the above service, let's call it > syslog-agent), which sends all logs to another syslog-ng instance (the one > running in a docker container, let's call it syslog-relay) though a network > destination. > - The syslog-relay sends messages to an elasticsearch instance, with > following configuration: > > options { > chain-hostnames(no); > use-dns(no); > keep-hostname(yes); > owner("syslog-ng"); > group("stratio"); > perm(0640); > time-reap(30); > mark-freq(10); > stats-freq(0); > bad-hostname("^gconfd$"); > flush-lines(100); > log-fifo-size(1000); > }; > > > > > > > > > > > > > > > > > > > > > > *destination d_elastic_default_0 { elasticsearch2( > cluster("myelastic") cluster-url("https://myelastic.logs:9200 > <https://myelastic.logs:9200>") client_mode("https") > index("default") type("log") flush-limit(1) > disk-buffer( mem-buf-size(16M) > disk-buf-size(16M) reliable(yes) > dir("/syslog-ng/log") ) http-auth-type("clientcert") > java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") > java-keystore-password("XXXXXX") > java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") > java-truststore-password("XXXXXXXXXX") );};* > > - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from > the host where the docker container is running, so when I kill the docker > container, the buffer file is not lost. > - I've set flush-limit to 1 because I thought that I may lost 1 > message only as much. > > This architecture is working fine (flush-limit=1 makes very slow, > but for this test is ok), but if I kill the syslog-relay docker container, > wait 5 to 10 seconds and start it again from scratch, I can see that > several hundreds of logs are missing in elasticsearch. I check it by > stopping the logger service and letting syslog-ng agent & relay to finish > the process enqueued messages. > > I can see in the syslog-agent stats that all logs messages have been > processed, so it seems the problem is on the syslog-relay. > > Is this behaviour expected? If so, how can I protect against loss of > messages in case of a syslog-relay docker container unexpected kill? > > Thanks in advance. > > > -- > > | Jose Angel Santiago > > [image: Logo_signature2.png] <http://www.stratio.com/> > > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > > 28224 Pozuelo de Alarcón, Madrid, Spain > > +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com > <https://twitter.com/stratiobd> > <https://www.linkedin.com/company/stratiobd> > <https://www.youtube.com/c/StratioBD> > > ____________________________________________________________ > __________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support > /documentation/?product=syslog-ng > FAQ: http://www.balabit.com/wiki/syslog-ng-faq > > >

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD> ____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Jose Angel Santiago

19 Sep 19 Sep

9:23 a.m.

Hi, I've realized that the queue_length of the disk_buffer never goes beyond 99, no matter the ratio message/sec logger process is writing or the value of mem-buf-size or disk-buf-size. Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer. Regards. 2018-09-17 14:33 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:

...

3.17.2

Thanks in advance.

2018-09-17 14:31 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

I'll try to reproduce this issue in my test environment. What syslog-ng version do you have?

L.

On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

After repeating the test without TLS between syslog-agent & syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, in debug mode and with two tcpdumps running, these are the conclussions:

- Syslog-agent sends every single message (checked with tcpdump file and syslog-agent log file) - Syslog-relay returns ACK for every single message, but sometimes the package contains more than one message (checked with tcpdump file) - Checking the syslog-relay log after killing & starting the docker container, I found the trace 'Reliable disk-buffer state loaded; filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' and the queued messages (from msg 236 to msg 334 are processed). The next message processed is the nº 440. and it comes from the active network destination again. - Once all logs have been processed, in elasticsearch (syslog-relay destination) I can find messages from 1 to 334, and messages from 440 to 2229.

How is it possible that syslog-relay returns ACK for messages 335 to 439 to syslog-agent, but they are not in the queue, nor in the .rqf file? Where did these messages go? I know they are sent before starting to fill syslog-agent disk-buffer file (checked with rqf file from syslog-agent when I kill the syslog-relay container). It seems that there's some kind of race condition that makes those messages are not processed/queued while syslog-relay is being killed.

Mi procedure consists of:

. Start tcpdump on both syslog-agent & syslog-relay hosts - Start logger process - Once logs are being inserted in elasticsearch, I kill the syslog-relay docker container, wait about 10 seconds, and run a new container (using a mapped volume where .rqf file and .persist file are) - Once logs are being inserted in elasticsearch again, I stop the logger process. - I wait until no log is left to be processed

Would you test anything else? I'm running out of ideas.

PD: Same test with logger process writing 5 msgs/sec ratio produces from 0 to 5 lost messages.

Regards.

2018-09-14 14:05 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:

...
Hi,

I've got more accurate information about where are my lost messages.

Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've checked that lost messages are the ones sent by the syslog-agent when the syslog-relay docker container is being killed. I can see those messages on syslog-agent log (I've got both agent & relay in debug mode) with its corresponding "Outgoing message" line, but those messages never reach the relay.

Could it be that the relay docker container still returns ACK to the agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng process within the container is being stopped? I'm about to test again using tcpdump to confirm this theory,

BTW, disk-buffers works ok, sometimes I get some duplicated messages when restarting the relay but that's not a problem for me. Forget about my .persist file re-creation theory, it doesn't happen.

Regards.

2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com>:

...
Hello,

It would not make sense to replace the persist file after restart, so it is not something that syslog-ng does. Only if that file is corrupted, in that case at startup there should be a log about it, have you checked the syslog-ng logs ? (it would be better to enable debug and/or verbose logs, and if possible share it with us.)

Could you reproduce the same behavior without docker (if possible) ?

Best Regards, Peter Kokai

On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I guess I know what is happening, when I start from scratch the docker container, even I provide a persist file and a buffer file within the mapped volume, syslog-ng recreates them so all messages in buffer file which were not processed by the relay are lost.

Is there any way to tell syslog-ng to use an already existing .persist file so it doesn't recreate the .rqf file?

Regards.

2018-09-13 16:23 GMT+02:00 Budai, László < laszlo.budai@oneidentity.com>:

> Hi, > > one problem could be if the flush-limit would be greater than 1... > in that case syslog-ng would use a HttpBulkMessageProcessor. > In this case syslog-ng pass the message to the > HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so > the message is removed from the diskbuffer), and if the dockerimage is > killed, all the messages stored in the HttpBulkMessageProcessor are lost. > But in your case syslog-ng should use the > HttpSingleMessageProcessor... which means that the messages are sent > one-by-one... > Could you check the diskbuffer with the dqtool? > > > L. > > On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < > jasantiago@stratio.com> wrote: > >> Hi, >> >> I'm running syslog-ng (with an elasticsearch2 destination >> configured) within a docker container, and I'm trying to avoid loss of >> messages if I kill the docker container and I start it again. >> >> This is my scenary: >> >> - A service which produces 20 lines of log per second >> - A sislog-ng instance reading from a wildcard-file source (but >> actually it only reads logs from the above service, let's call it >> syslog-agent), which sends all logs to another syslog-ng instance (the one >> running in a docker container, let's call it syslog-relay) though a network >> destination. >> - The syslog-relay sends messages to an elasticsearch instance, >> with following configuration: >> >> options { >> chain-hostnames(no); >> use-dns(no); >> keep-hostname(yes); >> owner("syslog-ng"); >> group("stratio"); >> perm(0640); >> time-reap(30); >> mark-freq(10); >> stats-freq(0); >> bad-hostname("^gconfd$"); >> flush-lines(100); >> log-fifo-size(1000); >> }; >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> *destination d_elastic_default_0 { elasticsearch2( >> cluster("myelastic") cluster-url("https://myelastic.logs:9200 >> <https://myelastic.logs:9200>") client_mode("https") >> index("default") type("log") flush-limit(1) >> disk-buffer( mem-buf-size(16M) >> disk-buf-size(16M) reliable(yes) >> dir("/syslog-ng/log") ) http-auth-type("clientcert") >> java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") >> java-keystore-password("XXXXXX") >> java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") >> java-truststore-password("XXXXXXXXXX") );};* >> >> - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from >> the host where the docker container is running, so when I kill the docker >> container, the buffer file is not lost. >> - I've set flush-limit to 1 because I thought that I may lost 1 >> message only as much. >> >> This architecture is working fine (flush-limit=1 makes very slow, >> but for this test is ok), but if I kill the syslog-relay docker container, >> wait 5 to 10 seconds and start it again from scratch, I can see that >> several hundreds of logs are missing in elasticsearch. I check it by >> stopping the logger service and letting syslog-ng agent & relay to finish >> the process enqueued messages. >> >> I can see in the syslog-agent stats that all logs messages have >> been processed, so it seems the problem is on the syslog-relay. >> >> Is this behaviour expected? If so, how can I protect against loss >> of messages in case of a syslog-relay docker container unexpected kill? >> >> Thanks in advance. >> >> >> -- >> >> | Jose Angel Santiago >> >> [image: Logo_signature2.png] <http://www.stratio.com/> >> >> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >> >> 28224 Pozuelo de Alarcón, Madrid, Spain >> >> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >> <https://twitter.com/stratiobd> >> <https://www.linkedin.com/company/stratiobd> >> <https://www.youtube.com/c/StratioBD> >> >> ____________________________________________________________ >> __________________ >> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >> Documentation: http://www.balabit.com/support >> /documentation/?product=syslog-ng >> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >> >> >> > > ____________________________________________________________ > __________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support > /documentation/?product=syslog-ng > FAQ: http://www.balabit.com/wiki/syslog-ng-faq > > >

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD> ____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

Budai, László

1:46 p.m.

Hi, what do you mean under 'syslog-relay returns ACK to syslog-ng agent' ? I guess this is TCP level ACK. And TCP level ACK is not the same as what syslog-ng uses for controlling the window size of the LogSources. It is possible that you have pending data in TCP buffers (maintained by the kernel) that is not read by syslog-ng, but they are already ACKd by TCP itself... (Solving this kind of situation requires application level ACK, which syslog-ng has only in the commercial edition.) L. On Wed, Sep 19, 2018 at 9:23 AM, Jose Angel Santiago <jasantiago@stratio.com

...

wrote:

...

Hi,

I've realized that the queue_length of the disk_buffer never goes beyond 99, no matter the ratio message/sec logger process is writing or the value of mem-buf-size or disk-buf-size.

Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.

Regards.

2018-09-17 14:33 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:

...
3.17.2

Thanks in advance.

2018-09-17 14:31 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

I'll try to reproduce this issue in my test environment. What syslog-ng version do you have?

L.

On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

After repeating the test without TLS between syslog-agent & syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, in debug mode and with two tcpdumps running, these are the conclussions:

- Syslog-agent sends every single message (checked with tcpdump file and syslog-agent log file) - Syslog-relay returns ACK for every single message, but sometimes the package contains more than one message (checked with tcpdump file) - Checking the syslog-relay log after killing & starting the docker container, I found the trace 'Reliable disk-buffer state loaded; filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' and the queued messages (from msg 236 to msg 334 are processed). The next message processed is the nº 440. and it comes from the active network destination again. - Once all logs have been processed, in elasticsearch (syslog-relay destination) I can find messages from 1 to 334, and messages from 440 to 2229.

How is it possible that syslog-relay returns ACK for messages 335 to 439 to syslog-agent, but they are not in the queue, nor in the .rqf file? Where did these messages go? I know they are sent before starting to fill syslog-agent disk-buffer file (checked with rqf file from syslog-agent when I kill the syslog-relay container). It seems that there's some kind of race condition that makes those messages are not processed/queued while syslog-relay is being killed.

Mi procedure consists of:

. Start tcpdump on both syslog-agent & syslog-relay hosts - Start logger process - Once logs are being inserted in elasticsearch, I kill the syslog-relay docker container, wait about 10 seconds, and run a new container (using a mapped volume where .rqf file and .persist file are) - Once logs are being inserted in elasticsearch again, I stop the logger process. - I wait until no log is left to be processed

Would you test anything else? I'm running out of ideas.

PD: Same test with logger process writing 5 msgs/sec ratio produces from 0 to 5 lost messages.

Regards.

2018-09-14 14:05 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com> :

...
Hi,

I've got more accurate information about where are my lost messages.

Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've checked that lost messages are the ones sent by the syslog-agent when the syslog-relay docker container is being killed. I can see those messages on syslog-agent log (I've got both agent & relay in debug mode) with its corresponding "Outgoing message" line, but those messages never reach the relay.

Could it be that the relay docker container still returns ACK to the agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng process within the container is being stopped? I'm about to test again using tcpdump to confirm this theory,

BTW, disk-buffers works ok, sometimes I get some duplicated messages when restarting the relay but that's not a problem for me. Forget about my .persist file re-creation theory, it doesn't happen.

Regards.

2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com>:

...
Hello,

It would not make sense to replace the persist file after restart, so it is not something that syslog-ng does. Only if that file is corrupted, in that case at startup there should be a log about it, have you checked the syslog-ng logs ? (it would be better to enable debug and/or verbose logs, and if possible share it with us.)

Could you reproduce the same behavior without docker (if possible) ?

Best Regards, Peter Kokai

On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < jasantiago@stratio.com> wrote:

> Hi, > > I guess I know what is happening, when I start from scratch the > docker container, even I provide a persist file and a buffer file within > the mapped volume, syslog-ng recreates them so all messages in buffer file > which were not processed by the relay are lost. > > Is there any way to tell syslog-ng to use an already existing > .persist file so it doesn't recreate the .rqf file? > > Regards. > > > > 2018-09-13 16:23 GMT+02:00 Budai, László < > laszlo.budai@oneidentity.com>: > >> Hi, >> >> one problem could be if the flush-limit would be greater than 1... >> in that case syslog-ng would use a HttpBulkMessageProcessor. >> In this case syslog-ng pass the message to the >> HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so >> the message is removed from the diskbuffer), and if the dockerimage is >> killed, all the messages stored in the HttpBulkMessageProcessor are lost. >> But in your case syslog-ng should use the >> HttpSingleMessageProcessor... which means that the messages are sent >> one-by-one... >> Could you check the diskbuffer with the dqtool? >> >> >> L. >> >> On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < >> jasantiago@stratio.com> wrote: >> >>> Hi, >>> >>> I'm running syslog-ng (with an elasticsearch2 destination >>> configured) within a docker container, and I'm trying to avoid loss of >>> messages if I kill the docker container and I start it again. >>> >>> This is my scenary: >>> >>> - A service which produces 20 lines of log per second >>> - A sislog-ng instance reading from a wildcard-file source (but >>> actually it only reads logs from the above service, let's call it >>> syslog-agent), which sends all logs to another syslog-ng instance (the one >>> running in a docker container, let's call it syslog-relay) though a network >>> destination. >>> - The syslog-relay sends messages to an elasticsearch instance, >>> with following configuration: >>> >>> options { >>> chain-hostnames(no); >>> use-dns(no); >>> keep-hostname(yes); >>> owner("syslog-ng"); >>> group("stratio"); >>> perm(0640); >>> time-reap(30); >>> mark-freq(10); >>> stats-freq(0); >>> bad-hostname("^gconfd$"); >>> flush-lines(100); >>> log-fifo-size(1000); >>> }; >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> *destination d_elastic_default_0 { elasticsearch2( >>> cluster("myelastic") cluster-url("https://myelastic.logs:9200 >>> <https://myelastic.logs:9200>") client_mode("https") >>> index("default") type("log") flush-limit(1) >>> disk-buffer( mem-buf-size(16M) >>> disk-buf-size(16M) reliable(yes) >>> dir("/syslog-ng/log") ) http-auth-type("clientcert") >>> java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") >>> java-keystore-password("XXXXXX") >>> java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") >>> java-truststore-password("XXXXXXXXXX") );};* >>> >>> - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from >>> the host where the docker container is running, so when I kill the docker >>> container, the buffer file is not lost. >>> - I've set flush-limit to 1 because I thought that I may lost 1 >>> message only as much. >>> >>> This architecture is working fine (flush-limit=1 makes very slow, >>> but for this test is ok), but if I kill the syslog-relay docker container, >>> wait 5 to 10 seconds and start it again from scratch, I can see that >>> several hundreds of logs are missing in elasticsearch. I check it by >>> stopping the logger service and letting syslog-ng agent & relay to finish >>> the process enqueued messages. >>> >>> I can see in the syslog-agent stats that all logs messages have >>> been processed, so it seems the problem is on the syslog-relay. >>> >>> Is this behaviour expected? If so, how can I protect against loss >>> of messages in case of a syslog-relay docker container unexpected kill? >>> >>> Thanks in advance. >>> >>> >>> -- >>> >>> | Jose Angel Santiago >>> >>> [image: Logo_signature2.png] <http://www.stratio.com/> >>> >>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>> >>> 28224 Pozuelo de Alarcón, Madrid, Spain >>> >>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>> <https://twitter.com/stratiobd> >>> <https://www.linkedin.com/company/stratiobd> >>> <https://www.youtube.com/c/StratioBD> >>> >>> ____________________________________________________________ >>> __________________ >>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>> Documentation: http://www.balabit.com/support >>> /documentation/?product=syslog-ng >>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>> >>> >>> >> >> ____________________________________________________________ >> __________________ >> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >> Documentation: http://www.balabit.com/support >> /documentation/?product=syslog-ng >> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >> >> >> > > > -- > > | Jose Angel Santiago > > [image: Logo_signature2.png] <http://www.stratio.com/> > > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > > 28224 Pozuelo de Alarcón, Madrid, Spain > > +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com > <https://twitter.com/stratiobd> > <https://www.linkedin.com/company/stratiobd> > <https://www.youtube.com/c/StratioBD> > ____________________________________________________________ > __________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support > /documentation/?product=syslog-ng > FAQ: http://www.balabit.com/wiki/syslog-ng-faq > > ____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Jose Angel Santiago

3:50 p.m.

Hi, You're right, what I saw on my tcpdump file was TCP ACK. I guess I lose so many messages because the TCP buffer is getting bigger and bigger since syslog-relay queue_length from disk_buffer reaches its limit. In fact, when this queue is not full (i.e with very low msg/sec ratio) I lose from 0-5 messages according to my tests. Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer. Thanks. 2018-09-19 13:46 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...

Hi,

what do you mean under 'syslog-relay returns ACK to syslog-ng agent' ? I guess this is TCP level ACK.

And TCP level ACK is not the same as what syslog-ng uses for controlling the window size of the LogSources.

It is possible that you have pending data in TCP buffers (maintained by the kernel) that is not read by syslog-ng, but they are already ACKd by TCP itself...

(Solving this kind of situation requires application level ACK, which syslog-ng has only in the commercial edition.)

L.

On Wed, Sep 19, 2018 at 9:23 AM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I've realized that the queue_length of the disk_buffer never goes beyond 99, no matter the ratio message/sec logger process is writing or the value of mem-buf-size or disk-buf-size.

Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.

Regards.

2018-09-17 14:33 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:

...
3.17.2

Thanks in advance.

2018-09-17 14:31 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

I'll try to reproduce this issue in my test environment. What syslog-ng version do you have?

L.

On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

After repeating the test without TLS between syslog-agent & syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, in debug mode and with two tcpdumps running, these are the conclussions:

- Syslog-agent sends every single message (checked with tcpdump file and syslog-agent log file) - Syslog-relay returns ACK for every single message, but sometimes the package contains more than one message (checked with tcpdump file) - Checking the syslog-relay log after killing & starting the docker container, I found the trace 'Reliable disk-buffer state loaded; filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' and the queued messages (from msg 236 to msg 334 are processed). The next message processed is the nº 440. and it comes from the active network destination again. - Once all logs have been processed, in elasticsearch (syslog-relay destination) I can find messages from 1 to 334, and messages from 440 to 2229.

How is it possible that syslog-relay returns ACK for messages 335 to 439 to syslog-agent, but they are not in the queue, nor in the .rqf file? Where did these messages go? I know they are sent before starting to fill syslog-agent disk-buffer file (checked with rqf file from syslog-agent when I kill the syslog-relay container). It seems that there's some kind of race condition that makes those messages are not processed/queued while syslog-relay is being killed.

Mi procedure consists of:

. Start tcpdump on both syslog-agent & syslog-relay hosts - Start logger process - Once logs are being inserted in elasticsearch, I kill the syslog-relay docker container, wait about 10 seconds, and run a new container (using a mapped volume where .rqf file and .persist file are) - Once logs are being inserted in elasticsearch again, I stop the logger process. - I wait until no log is left to be processed

Would you test anything else? I'm running out of ideas.

PD: Same test with logger process writing 5 msgs/sec ratio produces from 0 to 5 lost messages.

Regards.

2018-09-14 14:05 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com

...
:

...
Hi,

I've got more accurate information about where are my lost messages.

Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've checked that lost messages are the ones sent by the syslog-agent when the syslog-relay docker container is being killed. I can see those messages on syslog-agent log (I've got both agent & relay in debug mode) with its corresponding "Outgoing message" line, but those messages never reach the relay.

Could it be that the relay docker container still returns ACK to the agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng process within the container is being stopped? I'm about to test again using tcpdump to confirm this theory,

BTW, disk-buffers works ok, sometimes I get some duplicated messages when restarting the relay but that's not a problem for me. Forget about my .persist file re-creation theory, it doesn't happen.

Regards.

2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com> :

> Hello, > > It would not make sense to replace the persist file after restart, > so it is not something that syslog-ng does. Only if that file is corrupted, > in that case at startup there should be a log about it, have you checked > the syslog-ng logs ? (it would be better to enable debug and/or verbose > logs, and if possible share it with us.) > > Could you reproduce the same behavior without docker (if possible) ? > > > > Best Regards, > Peter Kokai > > On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < > jasantiago@stratio.com> wrote: > >> Hi, >> >> I guess I know what is happening, when I start from scratch the >> docker container, even I provide a persist file and a buffer file within >> the mapped volume, syslog-ng recreates them so all messages in buffer file >> which were not processed by the relay are lost. >> >> Is there any way to tell syslog-ng to use an already existing >> .persist file so it doesn't recreate the .rqf file? >> >> Regards. >> >> >> >> 2018-09-13 16:23 GMT+02:00 Budai, László < >> laszlo.budai@oneidentity.com>: >> >>> Hi, >>> >>> one problem could be if the flush-limit would be greater than 1... >>> in that case syslog-ng would use a HttpBulkMessageProcessor. >>> In this case syslog-ng pass the message to the >>> HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so >>> the message is removed from the diskbuffer), and if the dockerimage is >>> killed, all the messages stored in the HttpBulkMessageProcessor are lost. >>> But in your case syslog-ng should use the >>> HttpSingleMessageProcessor... which means that the messages are sent >>> one-by-one... >>> Could you check the diskbuffer with the dqtool? >>> >>> >>> L. >>> >>> On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < >>> jasantiago@stratio.com> wrote: >>> >>>> Hi, >>>> >>>> I'm running syslog-ng (with an elasticsearch2 destination >>>> configured) within a docker container, and I'm trying to avoid loss of >>>> messages if I kill the docker container and I start it again. >>>> >>>> This is my scenary: >>>> >>>> - A service which produces 20 lines of log per second >>>> - A sislog-ng instance reading from a wildcard-file source (but >>>> actually it only reads logs from the above service, let's call it >>>> syslog-agent), which sends all logs to another syslog-ng instance (the one >>>> running in a docker container, let's call it syslog-relay) though a network >>>> destination. >>>> - The syslog-relay sends messages to an elasticsearch instance, >>>> with following configuration: >>>> >>>> options { >>>> chain-hostnames(no); >>>> use-dns(no); >>>> keep-hostname(yes); >>>> owner("syslog-ng"); >>>> group("stratio"); >>>> perm(0640); >>>> time-reap(30); >>>> mark-freq(10); >>>> stats-freq(0); >>>> bad-hostname("^gconfd$"); >>>> flush-lines(100); >>>> log-fifo-size(1000); >>>> }; >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *destination d_elastic_default_0 { elasticsearch2( >>>> cluster("myelastic") cluster-url("https://myelastic.logs:9200 >>>> <https://myelastic.logs:9200>") client_mode("https") >>>> index("default") type("log") flush-limit(1) >>>> disk-buffer( mem-buf-size(16M) >>>> disk-buf-size(16M) reliable(yes) >>>> dir("/syslog-ng/log") ) http-auth-type("clientcert") >>>> java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") >>>> java-keystore-password("XXXXXX") >>>> java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") >>>> java-truststore-password("XXXXXXXXXX") );};* >>>> >>>> - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" from >>>> the host where the docker container is running, so when I kill the docker >>>> container, the buffer file is not lost. >>>> - I've set flush-limit to 1 because I thought that I may lost 1 >>>> message only as much. >>>> >>>> This architecture is working fine (flush-limit=1 makes very slow, >>>> but for this test is ok), but if I kill the syslog-relay docker container, >>>> wait 5 to 10 seconds and start it again from scratch, I can see that >>>> several hundreds of logs are missing in elasticsearch. I check it by >>>> stopping the logger service and letting syslog-ng agent & relay to finish >>>> the process enqueued messages. >>>> >>>> I can see in the syslog-agent stats that all logs messages have >>>> been processed, so it seems the problem is on the syslog-relay. >>>> >>>> Is this behaviour expected? If so, how can I protect against loss >>>> of messages in case of a syslog-relay docker container unexpected kill? >>>> >>>> Thanks in advance. >>>> >>>> >>>> -- >>>> >>>> | Jose Angel Santiago >>>> >>>> [image: Logo_signature2.png] <http://www.stratio.com/> >>>> >>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>> >>>> 28224 Pozuelo de Alarcón, Madrid, Spain >>>> >>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>>> <https://twitter.com/stratiobd> >>>> <https://www.linkedin.com/company/stratiobd> >>>> <https://www.youtube.com/c/StratioBD> >>>> >>>> ____________________________________________________________ >>>> __________________ >>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>> Documentation: http://www.balabit.com/support >>>> /documentation/?product=syslog-ng >>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>> >>>> >>>> >>> >>> ____________________________________________________________ >>> __________________ >>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>> Documentation: http://www.balabit.com/support >>> /documentation/?product=syslog-ng >>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>> >>> >>> >> >> >> -- >> >> | Jose Angel Santiago >> >> [image: Logo_signature2.png] <http://www.stratio.com/> >> >> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >> >> 28224 Pozuelo de Alarcón, Madrid, Spain >> >> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >> <https://twitter.com/stratiobd> >> <https://www.linkedin.com/company/stratiobd> >> <https://www.youtube.com/c/StratioBD> >> ____________________________________________________________ >> __________________ >> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >> Documentation: http://www.balabit.com/support >> /documentation/?product=syslog-ng >> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >> >> > ____________________________________________________________ > __________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support > /documentation/?product=syslog-ng > FAQ: http://www.balabit.com/wiki/syslog-ng-faq > > >

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Budai, László

5:09 p.m.

Hi, try to increase the log-iw-size (its default value is 100) on the relay. (it's a source option, so add to your source log-iw-size(100000) or something like similar). L. On Wed, Sep 19, 2018 at 3:50 PM, Jose Angel Santiago <jasantiago@stratio.com

...

wrote:

...

Hi,

You're right, what I saw on my tcpdump file was TCP ACK.

I guess I lose so many messages because the TCP buffer is getting bigger and bigger since syslog-relay queue_length from disk_buffer reaches its limit. In fact, when this queue is not full (i.e with very low msg/sec ratio) I lose from 0-5 messages according to my tests.

Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.

Thanks.

2018-09-19 13:46 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

what do you mean under 'syslog-relay returns ACK to syslog-ng agent' ? I guess this is TCP level ACK.

And TCP level ACK is not the same as what syslog-ng uses for controlling the window size of the LogSources.

It is possible that you have pending data in TCP buffers (maintained by the kernel) that is not read by syslog-ng, but they are already ACKd by TCP itself...

(Solving this kind of situation requires application level ACK, which syslog-ng has only in the commercial edition.)

L.

On Wed, Sep 19, 2018 at 9:23 AM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I've realized that the queue_length of the disk_buffer never goes beyond 99, no matter the ratio message/sec logger process is writing or the value of mem-buf-size or disk-buf-size.

Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.

Regards.

2018-09-17 14:33 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:

...
3.17.2

Thanks in advance.

2018-09-17 14:31 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com> :

...
Hi,

I'll try to reproduce this issue in my test environment. What syslog-ng version do you have?

L.

On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

After repeating the test without TLS between syslog-agent & syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, in debug mode and with two tcpdumps running, these are the conclussions:

- Syslog-agent sends every single message (checked with tcpdump file and syslog-agent log file) - Syslog-relay returns ACK for every single message, but sometimes the package contains more than one message (checked with tcpdump file) - Checking the syslog-relay log after killing & starting the docker container, I found the trace 'Reliable disk-buffer state loaded; filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' and the queued messages (from msg 236 to msg 334 are processed). The next message processed is the nº 440. and it comes from the active network destination again. - Once all logs have been processed, in elasticsearch (syslog-relay destination) I can find messages from 1 to 334, and messages from 440 to 2229.

How is it possible that syslog-relay returns ACK for messages 335 to 439 to syslog-agent, but they are not in the queue, nor in the .rqf file? Where did these messages go? I know they are sent before starting to fill syslog-agent disk-buffer file (checked with rqf file from syslog-agent when I kill the syslog-relay container). It seems that there's some kind of race condition that makes those messages are not processed/queued while syslog-relay is being killed.

Mi procedure consists of:

. Start tcpdump on both syslog-agent & syslog-relay hosts - Start logger process - Once logs are being inserted in elasticsearch, I kill the syslog-relay docker container, wait about 10 seconds, and run a new container (using a mapped volume where .rqf file and .persist file are) - Once logs are being inserted in elasticsearch again, I stop the logger process. - I wait until no log is left to be processed

Would you test anything else? I'm running out of ideas.

PD: Same test with logger process writing 5 msgs/sec ratio produces from 0 to 5 lost messages.

Regards.

2018-09-14 14:05 GMT+02:00 Jose Angel Santiago < jasantiago@stratio.com>:

> Hi, > > I've got more accurate information about where are my lost messages. > > Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've > checked that lost messages are the ones sent by the syslog-agent when the > syslog-relay docker container is being killed. I can see those messages on > syslog-agent log (I've got both agent & relay in debug mode) with its > corresponding "Outgoing message" line, but those messages never reach the > relay. > > Could it be that the relay docker container still returns ACK to the > agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng > process within the container is being stopped? I'm about to test again > using tcpdump to confirm this theory, > > BTW, disk-buffers works ok, sometimes I get some duplicated messages > when restarting the relay but that's not a problem for me. Forget about my > .persist file re-creation theory, it doesn't happen. > > Regards. > > > > 2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com > >: > >> Hello, >> >> It would not make sense to replace the persist file after restart, >> so it is not something that syslog-ng does. Only if that file is corrupted, >> in that case at startup there should be a log about it, have you checked >> the syslog-ng logs ? (it would be better to enable debug and/or verbose >> logs, and if possible share it with us.) >> >> Could you reproduce the same behavior without docker (if possible) ? >> >> >> >> Best Regards, >> Peter Kokai >> >> On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < >> jasantiago@stratio.com> wrote: >> >>> Hi, >>> >>> I guess I know what is happening, when I start from scratch the >>> docker container, even I provide a persist file and a buffer file within >>> the mapped volume, syslog-ng recreates them so all messages in buffer file >>> which were not processed by the relay are lost. >>> >>> Is there any way to tell syslog-ng to use an already existing >>> .persist file so it doesn't recreate the .rqf file? >>> >>> Regards. >>> >>> >>> >>> 2018-09-13 16:23 GMT+02:00 Budai, László < >>> laszlo.budai@oneidentity.com>: >>> >>>> Hi, >>>> >>>> one problem could be if the flush-limit would be greater than >>>> 1... in that case syslog-ng would use a HttpBulkMessageProcessor. >>>> In this case syslog-ng pass the message to the >>>> HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so >>>> the message is removed from the diskbuffer), and if the dockerimage is >>>> killed, all the messages stored in the HttpBulkMessageProcessor are lost. >>>> But in your case syslog-ng should use the >>>> HttpSingleMessageProcessor... which means that the messages are sent >>>> one-by-one... >>>> Could you check the diskbuffer with the dqtool? >>>> >>>> >>>> L. >>>> >>>> On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < >>>> jasantiago@stratio.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm running syslog-ng (with an elasticsearch2 destination >>>>> configured) within a docker container, and I'm trying to avoid loss of >>>>> messages if I kill the docker container and I start it again. >>>>> >>>>> This is my scenary: >>>>> >>>>> - A service which produces 20 lines of log per second >>>>> - A sislog-ng instance reading from a wildcard-file source (but >>>>> actually it only reads logs from the above service, let's call it >>>>> syslog-agent), which sends all logs to another syslog-ng instance (the one >>>>> running in a docker container, let's call it syslog-relay) though a network >>>>> destination. >>>>> - The syslog-relay sends messages to an elasticsearch instance, >>>>> with following configuration: >>>>> >>>>> options { >>>>> chain-hostnames(no); >>>>> use-dns(no); >>>>> keep-hostname(yes); >>>>> owner("syslog-ng"); >>>>> group("stratio"); >>>>> perm(0640); >>>>> time-reap(30); >>>>> mark-freq(10); >>>>> stats-freq(0); >>>>> bad-hostname("^gconfd$"); >>>>> flush-lines(100); >>>>> log-fifo-size(1000); >>>>> }; >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *destination d_elastic_default_0 { elasticsearch2( >>>>> cluster("myelastic") cluster-url("https://myelastic.logs:9200 >>>>> <https://myelastic.logs:9200>") client_mode("https") >>>>> index("default") type("log") flush-limit(1) >>>>> disk-buffer( mem-buf-size(16M) >>>>> disk-buf-size(16M) reliable(yes) >>>>> dir("/syslog-ng/log") ) http-auth-type("clientcert") >>>>> java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") >>>>> java-keystore-password("XXXXXX") >>>>> java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") >>>>> java-truststore-password("XXXXXXXXXX") );};* >>>>> >>>>> - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" >>>>> from the host where the docker container is running, so when I kill the >>>>> docker container, the buffer file is not lost. >>>>> - I've set flush-limit to 1 because I thought that I may lost 1 >>>>> message only as much. >>>>> >>>>> This architecture is working fine (flush-limit=1 makes very >>>>> slow, but for this test is ok), but if I kill the syslog-relay docker >>>>> container, wait 5 to 10 seconds and start it again from scratch, I can see >>>>> that several hundreds of logs are missing in elasticsearch. I check it by >>>>> stopping the logger service and letting syslog-ng agent & relay to finish >>>>> the process enqueued messages. >>>>> >>>>> I can see in the syslog-agent stats that all logs messages have >>>>> been processed, so it seems the problem is on the syslog-relay. >>>>> >>>>> Is this behaviour expected? If so, how can I protect against >>>>> loss of messages in case of a syslog-relay docker container unexpected kill? >>>>> >>>>> Thanks in advance. >>>>> >>>>> >>>>> -- >>>>> >>>>> | Jose Angel Santiago >>>>> >>>>> [image: Logo_signature2.png] <http://www.stratio.com/> >>>>> >>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>>> >>>>> 28224 Pozuelo de Alarcón, Madrid, Spain >>>>> >>>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>>>> <https://twitter.com/stratiobd> >>>>> <https://www.linkedin.com/company/stratiobd> >>>>> <https://www.youtube.com/c/StratioBD> >>>>> >>>>> ____________________________________________________________ >>>>> __________________ >>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>>> Documentation: http://www.balabit.com/support >>>>> /documentation/?product=syslog-ng >>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>>> >>>>> >>>>> >>>> >>>> ____________________________________________________________ >>>> __________________ >>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>> Documentation: http://www.balabit.com/support >>>> /documentation/?product=syslog-ng >>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>> >>>> >>>> >>> >>> >>> -- >>> >>> | Jose Angel Santiago >>> >>> [image: Logo_signature2.png] <http://www.stratio.com/> >>> >>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>> >>> 28224 Pozuelo de Alarcón, Madrid, Spain >>> >>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>> <https://twitter.com/stratiobd> >>> <https://www.linkedin.com/company/stratiobd> >>> <https://www.youtube.com/c/StratioBD> >>> ____________________________________________________________ >>> __________________ >>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>> Documentation: http://www.balabit.com/support >>> /documentation/?product=syslog-ng >>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>> >>> >> ____________________________________________________________ >> __________________ >> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >> Documentation: http://www.balabit.com/support >> /documentation/?product=syslog-ng >> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >> >> >> > > > -- > > | Jose Angel Santiago > > [image: Logo_signature2.png] <http://www.stratio.com/> > > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > > 28224 Pozuelo de Alarcón, Madrid, Spain > > +34 918 286 473 | www.stratio.com > <https://twitter.com/stratiobd> > <https://www.linkedin.com/company/stratiobd> > <https://www.youtube.com/c/StratioBD> >

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Budai, László

6:36 p.m.

Hi, ... checking your config I found an interesting thing... You set 'bad' values in your config for mem-buf-size and for disk-buf-size :) (it's not your fault... this is something we should change, this should not be so complicated). If you have one source AND disk-buf-size is as big as it is able to store a log-iw-size number of messages then when a message is received, syslog-ng writes it into the diskbuffer and checks the free size of diskbuf. When free_size < mem-buf-size, then the msg won't be acked, which means that the window size of the source is not incremented back, and when window size reach the 0, it won't receive any logs until a msg in the queue is not ACKd back. The question is now what are the defaults? log-iw-size: 100 mem-buf-size: 16M This means that -as you set 16M for diskbuf size- the diskbuffer can contain max. 100 messages, as it won't be acked when inserted into the queue (only when the destination is alive...). I'd increase the diskbuf size in your case: set it to a huge number, if you have enough storage. You can play with setting small mem-buf-size, but that should able to store log-iw-size messages (in bytes, so log-iw-size * mean(msg-size-in-bytes) or something similar). L. On Wed, Sep 19, 2018 at 3:50 PM, Jose Angel Santiago <jasantiago@stratio.com

...

wrote:

...

Hi,

You're right, what I saw on my tcpdump file was TCP ACK.

I guess I lose so many messages because the TCP buffer is getting bigger and bigger since syslog-relay queue_length from disk_buffer reaches its limit. In fact, when this queue is not full (i.e with very low msg/sec ratio) I lose from 0-5 messages according to my tests.

Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.

Thanks.

2018-09-19 13:46 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

what do you mean under 'syslog-relay returns ACK to syslog-ng agent' ? I guess this is TCP level ACK.

And TCP level ACK is not the same as what syslog-ng uses for controlling the window size of the LogSources.

It is possible that you have pending data in TCP buffers (maintained by the kernel) that is not read by syslog-ng, but they are already ACKd by TCP itself...

(Solving this kind of situation requires application level ACK, which syslog-ng has only in the commercial edition.)

L.

On Wed, Sep 19, 2018 at 9:23 AM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I've realized that the queue_length of the disk_buffer never goes beyond 99, no matter the ratio message/sec logger process is writing or the value of mem-buf-size or disk-buf-size.

Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.

Regards.

2018-09-17 14:33 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:

...
3.17.2

Thanks in advance.

2018-09-17 14:31 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com> :

...
Hi,

I'll try to reproduce this issue in my test environment. What syslog-ng version do you have?

L.

On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

After repeating the test without TLS between syslog-agent & syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, in debug mode and with two tcpdumps running, these are the conclussions:

- Syslog-agent sends every single message (checked with tcpdump file and syslog-agent log file) - Syslog-relay returns ACK for every single message, but sometimes the package contains more than one message (checked with tcpdump file) - Checking the syslog-relay log after killing & starting the docker container, I found the trace 'Reliable disk-buffer state loaded; filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' and the queued messages (from msg 236 to msg 334 are processed). The next message processed is the nº 440. and it comes from the active network destination again. - Once all logs have been processed, in elasticsearch (syslog-relay destination) I can find messages from 1 to 334, and messages from 440 to 2229.

How is it possible that syslog-relay returns ACK for messages 335 to 439 to syslog-agent, but they are not in the queue, nor in the .rqf file? Where did these messages go? I know they are sent before starting to fill syslog-agent disk-buffer file (checked with rqf file from syslog-agent when I kill the syslog-relay container). It seems that there's some kind of race condition that makes those messages are not processed/queued while syslog-relay is being killed.

Mi procedure consists of:

. Start tcpdump on both syslog-agent & syslog-relay hosts - Start logger process - Once logs are being inserted in elasticsearch, I kill the syslog-relay docker container, wait about 10 seconds, and run a new container (using a mapped volume where .rqf file and .persist file are) - Once logs are being inserted in elasticsearch again, I stop the logger process. - I wait until no log is left to be processed

Would you test anything else? I'm running out of ideas.

PD: Same test with logger process writing 5 msgs/sec ratio produces from 0 to 5 lost messages.

Regards.

2018-09-14 14:05 GMT+02:00 Jose Angel Santiago < jasantiago@stratio.com>:

> Hi, > > I've got more accurate information about where are my lost messages. > > Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've > checked that lost messages are the ones sent by the syslog-agent when the > syslog-relay docker container is being killed. I can see those messages on > syslog-agent log (I've got both agent & relay in debug mode) with its > corresponding "Outgoing message" line, but those messages never reach the > relay. > > Could it be that the relay docker container still returns ACK to the > agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng > process within the container is being stopped? I'm about to test again > using tcpdump to confirm this theory, > > BTW, disk-buffers works ok, sometimes I get some duplicated messages > when restarting the relay but that's not a problem for me. Forget about my > .persist file re-creation theory, it doesn't happen. > > Regards. > > > > 2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com > >: > >> Hello, >> >> It would not make sense to replace the persist file after restart, >> so it is not something that syslog-ng does. Only if that file is corrupted, >> in that case at startup there should be a log about it, have you checked >> the syslog-ng logs ? (it would be better to enable debug and/or verbose >> logs, and if possible share it with us.) >> >> Could you reproduce the same behavior without docker (if possible) ? >> >> >> >> Best Regards, >> Peter Kokai >> >> On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < >> jasantiago@stratio.com> wrote: >> >>> Hi, >>> >>> I guess I know what is happening, when I start from scratch the >>> docker container, even I provide a persist file and a buffer file within >>> the mapped volume, syslog-ng recreates them so all messages in buffer file >>> which were not processed by the relay are lost. >>> >>> Is there any way to tell syslog-ng to use an already existing >>> .persist file so it doesn't recreate the .rqf file? >>> >>> Regards. >>> >>> >>> >>> 2018-09-13 16:23 GMT+02:00 Budai, László < >>> laszlo.budai@oneidentity.com>: >>> >>>> Hi, >>>> >>>> one problem could be if the flush-limit would be greater than >>>> 1... in that case syslog-ng would use a HttpBulkMessageProcessor. >>>> In this case syslog-ng pass the message to the >>>> HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so >>>> the message is removed from the diskbuffer), and if the dockerimage is >>>> killed, all the messages stored in the HttpBulkMessageProcessor are lost. >>>> But in your case syslog-ng should use the >>>> HttpSingleMessageProcessor... which means that the messages are sent >>>> one-by-one... >>>> Could you check the diskbuffer with the dqtool? >>>> >>>> >>>> L. >>>> >>>> On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < >>>> jasantiago@stratio.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm running syslog-ng (with an elasticsearch2 destination >>>>> configured) within a docker container, and I'm trying to avoid loss of >>>>> messages if I kill the docker container and I start it again. >>>>> >>>>> This is my scenary: >>>>> >>>>> - A service which produces 20 lines of log per second >>>>> - A sislog-ng instance reading from a wildcard-file source (but >>>>> actually it only reads logs from the above service, let's call it >>>>> syslog-agent), which sends all logs to another syslog-ng instance (the one >>>>> running in a docker container, let's call it syslog-relay) though a network >>>>> destination. >>>>> - The syslog-relay sends messages to an elasticsearch instance, >>>>> with following configuration: >>>>> >>>>> options { >>>>> chain-hostnames(no); >>>>> use-dns(no); >>>>> keep-hostname(yes); >>>>> owner("syslog-ng"); >>>>> group("stratio"); >>>>> perm(0640); >>>>> time-reap(30); >>>>> mark-freq(10); >>>>> stats-freq(0); >>>>> bad-hostname("^gconfd$"); >>>>> flush-lines(100); >>>>> log-fifo-size(1000); >>>>> }; >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *destination d_elastic_default_0 { elasticsearch2( >>>>> cluster("myelastic") cluster-url("https://myelastic.logs:9200 >>>>> <https://myelastic.logs:9200>") client_mode("https") >>>>> index("default") type("log") flush-limit(1) >>>>> disk-buffer( mem-buf-size(16M) >>>>> disk-buf-size(16M) reliable(yes) >>>>> dir("/syslog-ng/log") ) http-auth-type("clientcert") >>>>> java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") >>>>> java-keystore-password("XXXXXX") >>>>> java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") >>>>> java-truststore-password("XXXXXXXXXX") );};* >>>>> >>>>> - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" >>>>> from the host where the docker container is running, so when I kill the >>>>> docker container, the buffer file is not lost. >>>>> - I've set flush-limit to 1 because I thought that I may lost 1 >>>>> message only as much. >>>>> >>>>> This architecture is working fine (flush-limit=1 makes very >>>>> slow, but for this test is ok), but if I kill the syslog-relay docker >>>>> container, wait 5 to 10 seconds and start it again from scratch, I can see >>>>> that several hundreds of logs are missing in elasticsearch. I check it by >>>>> stopping the logger service and letting syslog-ng agent & relay to finish >>>>> the process enqueued messages. >>>>> >>>>> I can see in the syslog-agent stats that all logs messages have >>>>> been processed, so it seems the problem is on the syslog-relay. >>>>> >>>>> Is this behaviour expected? If so, how can I protect against >>>>> loss of messages in case of a syslog-relay docker container unexpected kill? >>>>> >>>>> Thanks in advance. >>>>> >>>>> >>>>> -- >>>>> >>>>> | Jose Angel Santiago >>>>> >>>>> [image: Logo_signature2.png] <http://www.stratio.com/> >>>>> >>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>>> >>>>> 28224 Pozuelo de Alarcón, Madrid, Spain >>>>> >>>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>>>> <https://twitter.com/stratiobd> >>>>> <https://www.linkedin.com/company/stratiobd> >>>>> <https://www.youtube.com/c/StratioBD> >>>>> >>>>> ____________________________________________________________ >>>>> __________________ >>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>>> Documentation: http://www.balabit.com/support >>>>> /documentation/?product=syslog-ng >>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>>> >>>>> >>>>> >>>> >>>> ____________________________________________________________ >>>> __________________ >>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>> Documentation: http://www.balabit.com/support >>>> /documentation/?product=syslog-ng >>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>> >>>> >>>> >>> >>> >>> -- >>> >>> | Jose Angel Santiago >>> >>> [image: Logo_signature2.png] <http://www.stratio.com/> >>> >>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>> >>> 28224 Pozuelo de Alarcón, Madrid, Spain >>> >>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>> <https://twitter.com/stratiobd> >>> <https://www.linkedin.com/company/stratiobd> >>> <https://www.youtube.com/c/StratioBD> >>> ____________________________________________________________ >>> __________________ >>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>> Documentation: http://www.balabit.com/support >>> /documentation/?product=syslog-ng >>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>> >>> >> ____________________________________________________________ >> __________________ >> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >> Documentation: http://www.balabit.com/support >> /documentation/?product=syslog-ng >> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >> >> >> > > > -- > > | Jose Angel Santiago > > [image: Logo_signature2.png] <http://www.stratio.com/> > > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > > 28224 Pozuelo de Alarcón, Madrid, Spain > > +34 918 286 473 | www.stratio.com > <https://twitter.com/stratiobd> > <https://www.linkedin.com/company/stratiobd> > <https://www.youtube.com/c/StratioBD> >

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Jose Angel Santiago

24 Sep 24 Sep

11:30 a.m.

Hi, I've been playing with different values of log-iw-size & mem-buf-size and I got to avoid loss of messages :-) Bad news is that I'm afraid I've found another case of messages loss, I'll create a new thread since it's a completely different scenary. Thanks. 2018-09-19 18:36 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...

Hi,

... checking your config I found an interesting thing... You set 'bad' values in your config for mem-buf-size and for disk-buf-size :) (it's not your fault... this is something we should change, this should not be so complicated).

If you have one source AND disk-buf-size is as big as it is able to store a log-iw-size number of messages then when a message is received, syslog-ng writes it into the diskbuffer and checks the free size of diskbuf. When free_size < mem-buf-size, then the msg won't be acked, which means that the window size of the source is not incremented back, and when window size reach the 0, it won't receive any logs until a msg in the queue is not ACKd back.

The question is now what are the defaults? log-iw-size: 100 mem-buf-size: 16M

This means that -as you set 16M for diskbuf size- the diskbuffer can contain max. 100 messages, as it won't be acked when inserted into the queue (only when the destination is alive...).

I'd increase the diskbuf size in your case: set it to a huge number, if you have enough storage. You can play with setting small mem-buf-size, but that should able to store log-iw-size messages (in bytes, so log-iw-size * mean(msg-size-in-bytes) or something similar).

L.

On Wed, Sep 19, 2018 at 3:50 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

You're right, what I saw on my tcpdump file was TCP ACK.

I guess I lose so many messages because the TCP buffer is getting bigger and bigger since syslog-relay queue_length from disk_buffer reaches its limit. In fact, when this queue is not full (i.e with very low msg/sec ratio) I lose from 0-5 messages according to my tests.

Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.

Thanks.

2018-09-19 13:46 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

what do you mean under 'syslog-relay returns ACK to syslog-ng agent' ? I guess this is TCP level ACK.

And TCP level ACK is not the same as what syslog-ng uses for controlling the window size of the LogSources.

It is possible that you have pending data in TCP buffers (maintained by the kernel) that is not read by syslog-ng, but they are already ACKd by TCP itself...

(Solving this kind of situation requires application level ACK, which syslog-ng has only in the commercial edition.)

L.

On Wed, Sep 19, 2018 at 9:23 AM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I've realized that the queue_length of the disk_buffer never goes beyond 99, no matter the ratio message/sec logger process is writing or the value of mem-buf-size or disk-buf-size.

Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.

Regards.

2018-09-17 14:33 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com> :

...
3.17.2

Thanks in advance.

2018-09-17 14:31 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com

...
:

...
Hi,

I'll try to reproduce this issue in my test environment. What syslog-ng version do you have?

L.

On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

> Hi, > > After repeating the test without TLS between syslog-agent & > syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, > in debug mode and with two tcpdumps running, these are the conclussions: > > - Syslog-agent sends every single message (checked with tcpdump file > and syslog-agent log file) > - Syslog-relay returns ACK for every single message, but sometimes > the package contains more than one message (checked with tcpdump file) > - Checking the syslog-relay log after killing & starting the docker > container, I found the trace 'Reliable disk-buffer state loaded; > filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' > and the queued messages (from msg 236 to msg 334 are processed). The next > message processed is the nº 440. and it comes from the active network > destination again. > - Once all logs have been processed, in elasticsearch (syslog-relay > destination) I can find messages from 1 to 334, and messages from 440 to > 2229. > > How is it possible that syslog-relay returns ACK for messages 335 to > 439 to syslog-agent, but they are not in the queue, nor in the .rqf file? > Where did these messages go? I know they are sent before starting to fill > syslog-agent disk-buffer file (checked with rqf file from syslog-agent when > I kill the syslog-relay container). It seems that there's some kind of race > condition that makes those messages are not processed/queued while > syslog-relay is being killed. > > Mi procedure consists of: > > . Start tcpdump on both syslog-agent & syslog-relay hosts > - Start logger process > - Once logs are being inserted in elasticsearch, I kill the > syslog-relay docker container, wait about 10 seconds, and run a new > container (using a mapped volume where .rqf file and .persist file are) > - Once logs are being inserted in elasticsearch again, I stop the > logger process. > - I wait until no log is left to be processed > > Would you test anything else? I'm running out of ideas. > > PD: Same test with logger process writing 5 msgs/sec ratio produces > from 0 to 5 lost messages. > > Regards. > > > > > 2018-09-14 14:05 GMT+02:00 Jose Angel Santiago < > jasantiago@stratio.com>: > >> Hi, >> >> I've got more accurate information about where are my lost messages. >> >> Now I'm using disk-buffer in syslog-agent and syslog-relay, and >> I've checked that lost messages are the ones sent by the syslog-agent when >> the syslog-relay docker container is being killed. I can see those messages >> on syslog-agent log (I've got both agent & relay in debug mode) with its >> corresponding "Outgoing message" line, but those messages never reach the >> relay. >> >> Could it be that the relay docker container still returns ACK to >> the agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng >> process within the container is being stopped? I'm about to test again >> using tcpdump to confirm this theory, >> >> BTW, disk-buffers works ok, sometimes I get some duplicated >> messages when restarting the relay but that's not a problem for me. Forget >> about my .persist file re-creation theory, it doesn't happen. >> >> Regards. >> >> >> >> 2018-09-13 18:07 GMT+02:00 Péter, Kókai < >> peter.kokai@oneidentity.com>: >> >>> Hello, >>> >>> It would not make sense to replace the persist file after restart, >>> so it is not something that syslog-ng does. Only if that file is corrupted, >>> in that case at startup there should be a log about it, have you checked >>> the syslog-ng logs ? (it would be better to enable debug and/or verbose >>> logs, and if possible share it with us.) >>> >>> Could you reproduce the same behavior without docker (if possible) >>> ? >>> >>> >>> >>> Best Regards, >>> Peter Kokai >>> >>> On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < >>> jasantiago@stratio.com> wrote: >>> >>>> Hi, >>>> >>>> I guess I know what is happening, when I start from scratch the >>>> docker container, even I provide a persist file and a buffer file within >>>> the mapped volume, syslog-ng recreates them so all messages in buffer file >>>> which were not processed by the relay are lost. >>>> >>>> Is there any way to tell syslog-ng to use an already existing >>>> .persist file so it doesn't recreate the .rqf file? >>>> >>>> Regards. >>>> >>>> >>>> >>>> 2018-09-13 16:23 GMT+02:00 Budai, László < >>>> laszlo.budai@oneidentity.com>: >>>> >>>>> Hi, >>>>> >>>>> one problem could be if the flush-limit would be greater than >>>>> 1... in that case syslog-ng would use a HttpBulkMessageProcessor. >>>>> In this case syslog-ng pass the message to the >>>>> HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so >>>>> the message is removed from the diskbuffer), and if the dockerimage is >>>>> killed, all the messages stored in the HttpBulkMessageProcessor are lost. >>>>> But in your case syslog-ng should use the >>>>> HttpSingleMessageProcessor... which means that the messages are sent >>>>> one-by-one... >>>>> Could you check the diskbuffer with the dqtool? >>>>> >>>>> >>>>> L. >>>>> >>>>> On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < >>>>> jasantiago@stratio.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm running syslog-ng (with an elasticsearch2 destination >>>>>> configured) within a docker container, and I'm trying to avoid loss of >>>>>> messages if I kill the docker container and I start it again. >>>>>> >>>>>> This is my scenary: >>>>>> >>>>>> - A service which produces 20 lines of log per second >>>>>> - A sislog-ng instance reading from a wildcard-file source (but >>>>>> actually it only reads logs from the above service, let's call it >>>>>> syslog-agent), which sends all logs to another syslog-ng instance (the one >>>>>> running in a docker container, let's call it syslog-relay) though a network >>>>>> destination. >>>>>> - The syslog-relay sends messages to an elasticsearch instance, >>>>>> with following configuration: >>>>>> >>>>>> options { >>>>>> chain-hostnames(no); >>>>>> use-dns(no); >>>>>> keep-hostname(yes); >>>>>> owner("syslog-ng"); >>>>>> group("stratio"); >>>>>> perm(0640); >>>>>> time-reap(30); >>>>>> mark-freq(10); >>>>>> stats-freq(0); >>>>>> bad-hostname("^gconfd$"); >>>>>> flush-lines(100); >>>>>> log-fifo-size(1000); >>>>>> }; >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *destination d_elastic_default_0 { elasticsearch2( >>>>>> cluster("myelastic") cluster-url("https://myelastic.logs:9200 >>>>>> <https://myelastic.logs:9200>") client_mode("https") >>>>>> index("default") type("log") flush-limit(1) >>>>>> disk-buffer( mem-buf-size(16M) >>>>>> disk-buf-size(16M) reliable(yes) >>>>>> dir("/syslog-ng/log") ) http-auth-type("clientcert") >>>>>> java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") >>>>>> java-keystore-password("XXXXXX") >>>>>> java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") >>>>>> java-truststore-password("XXXXXXXXXX") );};* >>>>>> >>>>>> - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" >>>>>> from the host where the docker container is running, so when I kill the >>>>>> docker container, the buffer file is not lost. >>>>>> - I've set flush-limit to 1 because I thought that I may lost 1 >>>>>> message only as much. >>>>>> >>>>>> This architecture is working fine (flush-limit=1 makes very >>>>>> slow, but for this test is ok), but if I kill the syslog-relay docker >>>>>> container, wait 5 to 10 seconds and start it again from scratch, I can see >>>>>> that several hundreds of logs are missing in elasticsearch. I check it by >>>>>> stopping the logger service and letting syslog-ng agent & relay to finish >>>>>> the process enqueued messages. >>>>>> >>>>>> I can see in the syslog-agent stats that all logs messages have >>>>>> been processed, so it seems the problem is on the syslog-relay. >>>>>> >>>>>> Is this behaviour expected? If so, how can I protect against >>>>>> loss of messages in case of a syslog-relay docker container unexpected kill? >>>>>> >>>>>> Thanks in advance. >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> | Jose Angel Santiago >>>>>> >>>>>> [image: Logo_signature2.png] <http://www.stratio.com/> >>>>>> >>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>>>> >>>>>> 28224 Pozuelo de Alarcón, Madrid, Spain >>>>>> >>>>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>>>>> <https://twitter.com/stratiobd> >>>>>> <https://www.linkedin.com/company/stratiobd> >>>>>> <https://www.youtube.com/c/StratioBD> >>>>>> >>>>>> ____________________________________________________________ >>>>>> __________________ >>>>>> Member info: https://lists.balabit.hu/mailm >>>>>> an/listinfo/syslog-ng >>>>>> Documentation: http://www.balabit.com/support >>>>>> /documentation/?product=syslog-ng >>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>>>> >>>>>> >>>>>> >>>>> >>>>> ____________________________________________________________ >>>>> __________________ >>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>>> Documentation: http://www.balabit.com/support >>>>> /documentation/?product=syslog-ng >>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> | Jose Angel Santiago >>>> >>>> [image: Logo_signature2.png] <http://www.stratio.com/> >>>> >>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>> >>>> 28224 Pozuelo de Alarcón, Madrid, Spain >>>> >>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>>> <https://twitter.com/stratiobd> >>>> <https://www.linkedin.com/company/stratiobd> >>>> <https://www.youtube.com/c/StratioBD> >>>> ____________________________________________________________ >>>> __________________ >>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>> Documentation: http://www.balabit.com/support >>>> /documentation/?product=syslog-ng >>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>> >>>> >>> ____________________________________________________________ >>> __________________ >>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>> Documentation: http://www.balabit.com/support >>> /documentation/?product=syslog-ng >>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>> >>> >>> >> >> >> -- >> >> | Jose Angel Santiago >> >> [image: Logo_signature2.png] <http://www.stratio.com/> >> >> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >> >> 28224 Pozuelo de Alarcón, Madrid, Spain >> >> +34 918 286 473 | www.stratio.com >> <https://twitter.com/stratiobd> >> <https://www.linkedin.com/company/stratiobd> >> <https://www.youtube.com/c/StratioBD> >> > > > > -- > > | Jose Angel Santiago > > [image: Logo_signature2.png] <http://www.stratio.com/> > > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > > 28224 Pozuelo de Alarcón, Madrid, Spain > > +34 918 286 473 | www.stratio.com > <https://twitter.com/stratiobd> > <https://www.linkedin.com/company/stratiobd> > <https://www.youtube.com/c/StratioBD> > > ____________________________________________________________ > __________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support > /documentation/?product=syslog-ng > FAQ: http://www.balabit.com/wiki/syslog-ng-faq > > >

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

Jose Angel Santiago

25 Sep 25 Sep

8:48 a.m.

Nevermind, missing messages were not processed because of a bad syslog-ng wildcard-file source configuration. Thank you for your support. 2018-09-24 11:30 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:

...

Hi,

I've been playing with different values of log-iw-size & mem-buf-size and I got to avoid loss of messages :-)

Bad news is that I'm afraid I've found another case of messages loss, I'll create a new thread since it's a completely different scenary.

Thanks.

2018-09-19 18:36 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

... checking your config I found an interesting thing... You set 'bad' values in your config for mem-buf-size and for disk-buf-size :) (it's not your fault... this is something we should change, this should not be so complicated).

If you have one source AND disk-buf-size is as big as it is able to store a log-iw-size number of messages then when a message is received, syslog-ng writes it into the diskbuffer and checks the free size of diskbuf. When free_size < mem-buf-size, then the msg won't be acked, which means that the window size of the source is not incremented back, and when window size reach the 0, it won't receive any logs until a msg in the queue is not ACKd back.

The question is now what are the defaults? log-iw-size: 100 mem-buf-size: 16M

This means that -as you set 16M for diskbuf size- the diskbuffer can contain max. 100 messages, as it won't be acked when inserted into the queue (only when the destination is alive...).

I'd increase the diskbuf size in your case: set it to a huge number, if you have enough storage. You can play with setting small mem-buf-size, but that should able to store log-iw-size messages (in bytes, so log-iw-size * mean(msg-size-in-bytes) or something similar).

L.

On Wed, Sep 19, 2018 at 3:50 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

You're right, what I saw on my tcpdump file was TCP ACK.

I guess I lose so many messages because the TCP buffer is getting bigger and bigger since syslog-relay queue_length from disk_buffer reaches its limit. In fact, when this queue is not full (i.e with very low msg/sec ratio) I lose from 0-5 messages according to my tests.

Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.

Thanks.

2018-09-19 13:46 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:

...
Hi,

what do you mean under 'syslog-relay returns ACK to syslog-ng agent' ? I guess this is TCP level ACK.

And TCP level ACK is not the same as what syslog-ng uses for controlling the window size of the LogSources.

It is possible that you have pending data in TCP buffers (maintained by the kernel) that is not read by syslog-ng, but they are already ACKd by TCP itself...

(Solving this kind of situation requires application level ACK, which syslog-ng has only in the commercial edition.)

L.

On Wed, Sep 19, 2018 at 9:23 AM, Jose Angel Santiago < jasantiago@stratio.com> wrote:

...
Hi,

I've realized that the queue_length of the disk_buffer never goes beyond 99, no matter the ratio message/sec logger process is writing or the value of mem-buf-size or disk-buf-size.

Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.

Regards.

2018-09-17 14:33 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com

...
:

...
3.17.2

Thanks in advance.

2018-09-17 14:31 GMT+02:00 Budai, László < laszlo.budai@oneidentity.com>:

> Hi, > > I'll try to reproduce this issue in my test environment. > What syslog-ng version do you have? > > L. > > On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago < > jasantiago@stratio.com> wrote: > >> Hi, >> >> After repeating the test without TLS between syslog-agent & >> syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, >> in debug mode and with two tcpdumps running, these are the conclussions: >> >> - Syslog-agent sends every single message (checked with tcpdump >> file and syslog-agent log file) >> - Syslog-relay returns ACK for every single message, but sometimes >> the package contains more than one message (checked with tcpdump file) >> - Checking the syslog-relay log after killing & starting the docker >> container, I found the trace 'Reliable disk-buffer state loaded; >> filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' >> and the queued messages (from msg 236 to msg 334 are processed). The next >> message processed is the nº 440. and it comes from the active network >> destination again. >> - Once all logs have been processed, in elasticsearch (syslog-relay >> destination) I can find messages from 1 to 334, and messages from 440 to >> 2229. >> >> How is it possible that syslog-relay returns ACK for messages 335 >> to 439 to syslog-agent, but they are not in the queue, nor in the .rqf >> file? Where did these messages go? I know they are sent before starting to >> fill syslog-agent disk-buffer file (checked with rqf file from syslog-agent >> when I kill the syslog-relay container). It seems that there's some kind of >> race condition that makes those messages are not processed/queued while >> syslog-relay is being killed. >> >> Mi procedure consists of: >> >> . Start tcpdump on both syslog-agent & syslog-relay hosts >> - Start logger process >> - Once logs are being inserted in elasticsearch, I kill the >> syslog-relay docker container, wait about 10 seconds, and run a new >> container (using a mapped volume where .rqf file and .persist file are) >> - Once logs are being inserted in elasticsearch again, I stop the >> logger process. >> - I wait until no log is left to be processed >> >> Would you test anything else? I'm running out of ideas. >> >> PD: Same test with logger process writing 5 msgs/sec ratio produces >> from 0 to 5 lost messages. >> >> Regards. >> >> >> >> >> 2018-09-14 14:05 GMT+02:00 Jose Angel Santiago < >> jasantiago@stratio.com>: >> >>> Hi, >>> >>> I've got more accurate information about where are my lost >>> messages. >>> >>> Now I'm using disk-buffer in syslog-agent and syslog-relay, and >>> I've checked that lost messages are the ones sent by the syslog-agent when >>> the syslog-relay docker container is being killed. I can see those messages >>> on syslog-agent log (I've got both agent & relay in debug mode) with its >>> corresponding "Outgoing message" line, but those messages never reach the >>> relay. >>> >>> Could it be that the relay docker container still returns ACK to >>> the agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng >>> process within the container is being stopped? I'm about to test again >>> using tcpdump to confirm this theory, >>> >>> BTW, disk-buffers works ok, sometimes I get some duplicated >>> messages when restarting the relay but that's not a problem for me. Forget >>> about my .persist file re-creation theory, it doesn't happen. >>> >>> Regards. >>> >>> >>> >>> 2018-09-13 18:07 GMT+02:00 Péter, Kókai < >>> peter.kokai@oneidentity.com>: >>> >>>> Hello, >>>> >>>> It would not make sense to replace the persist file after >>>> restart, so it is not something that syslog-ng does. Only if that file is >>>> corrupted, in that case at startup there should be a log about it, have you >>>> checked the syslog-ng logs ? (it would be better to enable debug and/or >>>> verbose logs, and if possible share it with us.) >>>> >>>> Could you reproduce the same behavior without docker (if >>>> possible) ? >>>> >>>> >>>> >>>> Best Regards, >>>> Peter Kokai >>>> >>>> On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < >>>> jasantiago@stratio.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I guess I know what is happening, when I start from scratch the >>>>> docker container, even I provide a persist file and a buffer file within >>>>> the mapped volume, syslog-ng recreates them so all messages in buffer file >>>>> which were not processed by the relay are lost. >>>>> >>>>> Is there any way to tell syslog-ng to use an already existing >>>>> .persist file so it doesn't recreate the .rqf file? >>>>> >>>>> Regards. >>>>> >>>>> >>>>> >>>>> 2018-09-13 16:23 GMT+02:00 Budai, László < >>>>> laszlo.budai@oneidentity.com>: >>>>> >>>>>> Hi, >>>>>> >>>>>> one problem could be if the flush-limit would be greater than >>>>>> 1... in that case syslog-ng would use a HttpBulkMessageProcessor. >>>>>> In this case syslog-ng pass the message to the >>>>>> HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so >>>>>> the message is removed from the diskbuffer), and if the dockerimage is >>>>>> killed, all the messages stored in the HttpBulkMessageProcessor are lost. >>>>>> But in your case syslog-ng should use the >>>>>> HttpSingleMessageProcessor... which means that the messages are sent >>>>>> one-by-one... >>>>>> Could you check the diskbuffer with the dqtool? >>>>>> >>>>>> >>>>>> L. >>>>>> >>>>>> On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < >>>>>> jasantiago@stratio.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm running syslog-ng (with an elasticsearch2 destination >>>>>>> configured) within a docker container, and I'm trying to avoid loss of >>>>>>> messages if I kill the docker container and I start it again. >>>>>>> >>>>>>> This is my scenary: >>>>>>> >>>>>>> - A service which produces 20 lines of log per second >>>>>>> - A sislog-ng instance reading from a wildcard-file source >>>>>>> (but actually it only reads logs from the above service, let's call it >>>>>>> syslog-agent), which sends all logs to another syslog-ng instance (the one >>>>>>> running in a docker container, let's call it syslog-relay) though a network >>>>>>> destination. >>>>>>> - The syslog-relay sends messages to an elasticsearch >>>>>>> instance, with following configuration: >>>>>>> >>>>>>> options { >>>>>>> chain-hostnames(no); >>>>>>> use-dns(no); >>>>>>> keep-hostname(yes); >>>>>>> owner("syslog-ng"); >>>>>>> group("stratio"); >>>>>>> perm(0640); >>>>>>> time-reap(30); >>>>>>> mark-freq(10); >>>>>>> stats-freq(0); >>>>>>> bad-hostname("^gconfd$"); >>>>>>> flush-lines(100); >>>>>>> log-fifo-size(1000); >>>>>>> }; >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *destination d_elastic_default_0 { elasticsearch2( >>>>>>> cluster("myelastic") cluster-url("https://myelastic.logs:9200 >>>>>>> <https://myelastic.logs:9200>") client_mode("https") >>>>>>> index("default") type("log") flush-limit(1) >>>>>>> disk-buffer( mem-buf-size(16M) >>>>>>> disk-buf-size(16M) reliable(yes) >>>>>>> dir("/syslog-ng/log") ) http-auth-type("clientcert") >>>>>>> java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") >>>>>>> java-keystore-password("XXXXXX") >>>>>>> java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") >>>>>>> java-truststore-password("XXXXXXXXXX") );};* >>>>>>> >>>>>>> - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" >>>>>>> from the host where the docker container is running, so when I kill the >>>>>>> docker container, the buffer file is not lost. >>>>>>> - I've set flush-limit to 1 because I thought that I may lost >>>>>>> 1 message only as much. >>>>>>> >>>>>>> This architecture is working fine (flush-limit=1 makes very >>>>>>> slow, but for this test is ok), but if I kill the syslog-relay docker >>>>>>> container, wait 5 to 10 seconds and start it again from scratch, I can see >>>>>>> that several hundreds of logs are missing in elasticsearch. I check it by >>>>>>> stopping the logger service and letting syslog-ng agent & relay to finish >>>>>>> the process enqueued messages. >>>>>>> >>>>>>> I can see in the syslog-agent stats that all logs messages >>>>>>> have been processed, so it seems the problem is on the syslog-relay. >>>>>>> >>>>>>> Is this behaviour expected? If so, how can I protect against >>>>>>> loss of messages in case of a syslog-relay docker container unexpected kill? >>>>>>> >>>>>>> Thanks in advance. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> | Jose Angel Santiago >>>>>>> >>>>>>> [image: Logo_signature2.png] <http://www.stratio.com/> >>>>>>> >>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>>>>> >>>>>>> 28224 Pozuelo de Alarcón, Madrid, Spain >>>>>>> >>>>>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>>>>>> <https://twitter.com/stratiobd> >>>>>>> <https://www.linkedin.com/company/stratiobd> >>>>>>> <https://www.youtube.com/c/StratioBD> >>>>>>> >>>>>>> ____________________________________________________________ >>>>>>> __________________ >>>>>>> Member info: https://lists.balabit.hu/mailm >>>>>>> an/listinfo/syslog-ng >>>>>>> Documentation: http://www.balabit.com/support >>>>>>> /documentation/?product=syslog-ng >>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> ____________________________________________________________ >>>>>> __________________ >>>>>> Member info: https://lists.balabit.hu/mailm >>>>>> an/listinfo/syslog-ng >>>>>> Documentation: http://www.balabit.com/support >>>>>> /documentation/?product=syslog-ng >>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> | Jose Angel Santiago >>>>> >>>>> [image: Logo_signature2.png] <http://www.stratio.com/> >>>>> >>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>>> >>>>> 28224 Pozuelo de Alarcón, Madrid, Spain >>>>> >>>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>>>> <https://twitter.com/stratiobd> >>>>> <https://www.linkedin.com/company/stratiobd> >>>>> <https://www.youtube.com/c/StratioBD> >>>>> ____________________________________________________________ >>>>> __________________ >>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>>> Documentation: http://www.balabit.com/support >>>>> /documentation/?product=syslog-ng >>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>>> >>>>> >>>> ____________________________________________________________ >>>> __________________ >>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>> Documentation: http://www.balabit.com/support >>>> /documentation/?product=syslog-ng >>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>> >>>> >>>> >>> >>> >>> -- >>> >>> | Jose Angel Santiago >>> >>> [image: Logo_signature2.png] <http://www.stratio.com/> >>> >>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>> >>> 28224 Pozuelo de Alarcón, Madrid, Spain >>> >>> +34 918 286 473 | www.stratio.com >>> <https://twitter.com/stratiobd> >>> <https://www.linkedin.com/company/stratiobd> >>> <https://www.youtube.com/c/StratioBD> >>> >> >> >> >> -- >> >> | Jose Angel Santiago >> >> [image: Logo_signature2.png] <http://www.stratio.com/> >> >> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >> >> 28224 Pozuelo de Alarcón, Madrid, Spain >> >> +34 918 286 473 | www.stratio.com >> <https://twitter.com/stratiobd> >> <https://www.linkedin.com/company/stratiobd> >> <https://www.youtube.com/c/StratioBD> >> >> ____________________________________________________________ >> __________________ >> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >> Documentation: http://www.balabit.com/support >> /documentation/?product=syslog-ng >> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >> >> >> > > ____________________________________________________________ > __________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support > /documentation/?product=syslog-ng > FAQ: http://www.balabit.com/wiki/syslog-ng-faq > > >

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq

--

| Jose Angel Santiago

[image: Logo_signature2.png] <http://www.stratio.com/>

Vía de las dos Castillas, 33, Ática 4, 3ª Planta

28224 Pozuelo de Alarcón, Madrid, Spain

+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>

2568

Age (days ago)

2580

Last active (days ago)

List overview

Download

16 comments

3 participants

participants (3)

Budai, László
Jose Angel Santiago
Péter, Kókai