[syslog-ng] disk-buffer in elasticsearch2 destination loses messages if docker container is killed

Budai, László laszlo.budai at oneidentity.com
Wed Sep 19 16:36:34 UTC 2018


Hi,

... checking your config I found an interesting thing...
You set 'bad' values in your config for mem-buf-size and for disk-buf-size
:) (it's not your fault... this is something we should change, this should
not be so complicated).

If you have one source AND disk-buf-size is as big as it is able to store a
log-iw-size number of messages then when a message is received, syslog-ng
writes it into the diskbuffer and checks the free size of diskbuf.
When free_size < mem-buf-size, then the msg won't be acked, which means
that the window size of the source is not incremented back, and when window
size reach the 0, it won't receive any logs until a msg in the queue is not
ACKd back.

The question is now what are the defaults?
log-iw-size: 100
mem-buf-size: 16M

This means that -as you set 16M for diskbuf size- the diskbuffer can
contain max. 100 messages, as it won't be acked when inserted into the
queue (only when the destination is alive...).

I'd increase the diskbuf size in your case: set it to a huge number, if you
have enough storage. You can play with setting small mem-buf-size, but that
should able to store log-iw-size messages (in bytes, so log-iw-size *
mean(msg-size-in-bytes) or something similar).

L.

On Wed, Sep 19, 2018 at 3:50 PM, Jose Angel Santiago <jasantiago at stratio.com
> wrote:

> Hi,
>
> You're right, what I saw on my tcpdump file was TCP ACK.
>
> I guess I lose so many messages because the TCP buffer is getting bigger
> and bigger since syslog-relay queue_length from disk_buffer reaches its
> limit. In fact, when this queue is not full (i.e with very low msg/sec
> ratio) I lose from 0-5 messages according to my tests.
>
> Which parameter is the one which would match the queue_length of the
> disk_buffer? It seems that it should be *mem-buf-length()* but I'm using
> reliable=yes so I shouldn't use that parameter, and I'd like to test my
> scenary being able to queue more than 99 messages on disk_buffer.
>
> Thanks.
>
> 2018-09-19 13:46 GMT+02:00 Budai, László <laszlo.budai at oneidentity.com>:
>
>> Hi,
>>
>> what do you mean under 'syslog-relay returns ACK to syslog-ng agent' ?
>> I guess this is TCP level ACK.
>>
>> And TCP level ACK is not the same as what syslog-ng uses for controlling
>> the window size of the LogSources.
>>
>> It is possible that you have pending data in TCP buffers (maintained by
>> the kernel) that is not read by syslog-ng, but they are already ACKd by TCP
>> itself...
>>
>> (Solving this kind of situation requires application level ACK, which
>> syslog-ng has only in the commercial edition.)
>>
>> L.
>>
>>
>> On Wed, Sep 19, 2018 at 9:23 AM, Jose Angel Santiago <
>> jasantiago at stratio.com> wrote:
>>
>>> Hi,
>>>
>>> I've realized that the queue_length of the disk_buffer never goes beyond
>>> 99, no matter the ratio message/sec logger process is writing or the value
>>> of mem-buf-size or disk-buf-size.
>>>
>>> Which parameter is the one which would match the queue_length of the
>>> disk_buffer? It seems that it should be *mem-buf-length()* but I'm
>>> using reliable=yes so I shouldn't use that parameter, and I'd like to test
>>> my scenary being able to queue more than 99 messages on disk_buffer.
>>>
>>> Regards.
>>>
>>> 2018-09-17 14:33 GMT+02:00 Jose Angel Santiago <jasantiago at stratio.com>:
>>>
>>>> 3.17.2
>>>>
>>>> Thanks in advance.
>>>>
>>>> 2018-09-17 14:31 GMT+02:00 Budai, László <laszlo.budai at oneidentity.com>
>>>> :
>>>>
>>>>> Hi,
>>>>>
>>>>> I'll try to reproduce this issue in my test environment.
>>>>> What syslog-ng version do you have?
>>>>>
>>>>> L.
>>>>>
>>>>> On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago <
>>>>> jasantiago at stratio.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> After repeating the test without TLS between syslog-agent &
>>>>>> syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio,
>>>>>> in debug mode and with two tcpdumps running, these are the conclussions:
>>>>>>
>>>>>> - Syslog-agent sends every single message (checked with tcpdump file
>>>>>> and syslog-agent log file)
>>>>>> - Syslog-relay returns ACK for every single message, but sometimes
>>>>>> the package contains more than one message (checked with tcpdump file)
>>>>>> - Checking the syslog-relay log after killing & starting the docker
>>>>>> container, I found the trace 'Reliable disk-buffer state loaded;
>>>>>> filename='/syslog-ng-00000.rqf', queue_length='99', size='173844'
>>>>>> and the queued messages (from msg 236 to msg 334 are processed). The next
>>>>>> message processed is the nº 440. and it comes from the active network
>>>>>> destination again.
>>>>>> - Once all logs have been processed, in elasticsearch (syslog-relay
>>>>>> destination) I can find messages from 1 to 334, and messages from 440 to
>>>>>> 2229.
>>>>>>
>>>>>> How is it possible that syslog-relay returns ACK for messages 335 to
>>>>>> 439 to syslog-agent, but they are not in the queue, nor in the .rqf file?
>>>>>> Where did these messages go? I know they are sent before starting to fill
>>>>>> syslog-agent disk-buffer file (checked with rqf file from syslog-agent when
>>>>>> I kill the syslog-relay container). It seems that there's some kind of race
>>>>>> condition that makes those messages are not processed/queued  while
>>>>>> syslog-relay is being killed.
>>>>>>
>>>>>> Mi procedure consists of:
>>>>>>
>>>>>> . Start tcpdump on both syslog-agent & syslog-relay hosts
>>>>>> - Start logger process
>>>>>> - Once logs are being inserted in elasticsearch, I kill the
>>>>>> syslog-relay docker container, wait about 10 seconds, and run a new
>>>>>> container (using a mapped volume where .rqf file and .persist file are)
>>>>>> - Once logs are being inserted in elasticsearch again, I stop the
>>>>>> logger process.
>>>>>> - I wait until no log is left to be processed
>>>>>>
>>>>>> Would you test anything else? I'm running out of ideas.
>>>>>>
>>>>>> PD: Same test with logger process writing 5 msgs/sec ratio produces
>>>>>> from 0 to 5 lost messages.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2018-09-14 14:05 GMT+02:00 Jose Angel Santiago <
>>>>>> jasantiago at stratio.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've got more accurate information about where are my lost messages.
>>>>>>>
>>>>>>> Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've
>>>>>>> checked that lost messages are the ones sent by the syslog-agent when the
>>>>>>> syslog-relay docker container is being killed. I can see those messages on
>>>>>>> syslog-agent log (I've got both agent & relay in debug mode) with its
>>>>>>> corresponding "Outgoing message" line, but those messages never reach the
>>>>>>> relay.
>>>>>>>
>>>>>>> Could it be that the relay docker container still returns ACK to the
>>>>>>> agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng
>>>>>>> process within the container is being stopped? I'm about to test again
>>>>>>> using tcpdump to confirm this theory,
>>>>>>>
>>>>>>> BTW, disk-buffers works ok, sometimes I get some duplicated messages
>>>>>>> when restarting the relay but that's not a problem for me. Forget about my
>>>>>>> .persist file re-creation theory, it doesn't happen.
>>>>>>>
>>>>>>> Regards.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai at oneidentity.com
>>>>>>> >:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> It would not make sense to replace the persist file after restart,
>>>>>>>> so it is not something that syslog-ng does. Only if that file is corrupted,
>>>>>>>> in that case at startup there should be a log about it, have you checked
>>>>>>>> the syslog-ng logs ? (it would be better to enable debug and/or verbose
>>>>>>>> logs, and if possible share it with us.)
>>>>>>>>
>>>>>>>> Could you reproduce the same behavior without docker (if possible) ?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Peter Kokai
>>>>>>>>
>>>>>>>> On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago <
>>>>>>>> jasantiago at stratio.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I guess I know what is happening, when I start from scratch the
>>>>>>>>> docker container, even I provide a persist file and a buffer file within
>>>>>>>>> the mapped volume, syslog-ng recreates them so all messages in buffer file
>>>>>>>>> which were not processed by the relay are lost.
>>>>>>>>>
>>>>>>>>> Is there any way to tell syslog-ng to use an already existing
>>>>>>>>> .persist file so it doesn't recreate the .rqf file?
>>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2018-09-13 16:23 GMT+02:00 Budai, László <
>>>>>>>>> laszlo.budai at oneidentity.com>:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> one problem could be if the flush-limit would be greater than
>>>>>>>>>> 1... in that case syslog-ng would use a HttpBulkMessageProcessor.
>>>>>>>>>> In this case syslog-ng pass the message to the
>>>>>>>>>> HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so
>>>>>>>>>> the message is removed from the diskbuffer), and if the dockerimage is
>>>>>>>>>> killed, all the messages stored in the HttpBulkMessageProcessor are lost.
>>>>>>>>>> But in your case syslog-ng should use the
>>>>>>>>>> HttpSingleMessageProcessor... which means that the messages are sent
>>>>>>>>>> one-by-one...
>>>>>>>>>> Could you check the diskbuffer with the dqtool?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> L.
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago <
>>>>>>>>>> jasantiago at stratio.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I'm running syslog-ng (with an elasticsearch2 destination
>>>>>>>>>>> configured) within a docker container, and I'm trying to avoid loss of
>>>>>>>>>>> messages if I kill the docker container and I start it again.
>>>>>>>>>>>
>>>>>>>>>>> This is my scenary:
>>>>>>>>>>>
>>>>>>>>>>> - A service which produces 20 lines of log per second
>>>>>>>>>>> - A sislog-ng instance reading from a wildcard-file source (but
>>>>>>>>>>> actually it only reads logs from the above service, let's call it
>>>>>>>>>>> syslog-agent), which sends all logs to another syslog-ng instance (the one
>>>>>>>>>>> running in a docker container, let's call it syslog-relay) though a network
>>>>>>>>>>> destination.
>>>>>>>>>>> - The syslog-relay sends messages to an elasticsearch instance,
>>>>>>>>>>> with following configuration:
>>>>>>>>>>>
>>>>>>>>>>> options {
>>>>>>>>>>>     chain-hostnames(no);
>>>>>>>>>>>     use-dns(no);
>>>>>>>>>>>     keep-hostname(yes);
>>>>>>>>>>>     owner("syslog-ng");
>>>>>>>>>>>     group("stratio");
>>>>>>>>>>>     perm(0640);
>>>>>>>>>>>     time-reap(30);
>>>>>>>>>>>     mark-freq(10);
>>>>>>>>>>>     stats-freq(0);
>>>>>>>>>>>     bad-hostname("^gconfd$");
>>>>>>>>>>>     flush-lines(100);
>>>>>>>>>>>     log-fifo-size(1000);
>>>>>>>>>>>     };
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *destination d_elastic_default_0 {    elasticsearch2(
>>>>>>>>>>> cluster("myelastic")        cluster-url("https://myelastic.logs:9200
>>>>>>>>>>> <https://myelastic.logs:9200>")        client_mode("https")
>>>>>>>>>>> index("default")        type("log")        flush-limit(1)
>>>>>>>>>>> disk-buffer(            mem-buf-size(16M)
>>>>>>>>>>> disk-buf-size(16M)            reliable(yes)
>>>>>>>>>>> dir("/syslog-ng/log")        )        http-auth-type("clientcert")
>>>>>>>>>>> java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks")
>>>>>>>>>>> java-keystore-password("XXXXXX")
>>>>>>>>>>> java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks")
>>>>>>>>>>> java-truststore-password("XXXXXXXXXX")    );};*
>>>>>>>>>>>
>>>>>>>>>>> - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer"
>>>>>>>>>>> from the host where the docker container is running, so when I kill the
>>>>>>>>>>> docker container, the buffer file is not lost.
>>>>>>>>>>> - I've set flush-limit to 1 because I thought that I may lost 1
>>>>>>>>>>> message only as much.
>>>>>>>>>>>
>>>>>>>>>>> This architecture is working fine (flush-limit=1 makes very
>>>>>>>>>>> slow, but for this test is ok), but if I kill the syslog-relay docker
>>>>>>>>>>> container, wait 5 to 10 seconds and start it again from scratch, I can see
>>>>>>>>>>> that several hundreds of logs are missing in elasticsearch. I check it by
>>>>>>>>>>> stopping the logger service and letting syslog-ng agent & relay to finish
>>>>>>>>>>> the process enqueued messages.
>>>>>>>>>>>
>>>>>>>>>>> I can see in the syslog-agent stats that all logs messages have
>>>>>>>>>>> been processed, so it seems the problem is on the syslog-relay.
>>>>>>>>>>>
>>>>>>>>>>> Is this behaviour expected? If so, how can I protect against
>>>>>>>>>>> loss of messages in case of a syslog-relay docker container unexpected kill?
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> | Jose Angel Santiago
>>>>>>>>>>>
>>>>>>>>>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>>>>>>>>>
>>>>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>>>>>>
>>>>>>>>>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>>>>>>>>>
>>>>>>>>>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com
>>>>>>>>>>> <https://twitter.com/stratiobd>
>>>>>>>>>>> <https://www.linkedin.com/company/stratiobd>
>>>>>>>>>>> <https://www.youtube.com/c/StratioBD>
>>>>>>>>>>>
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> __________________
>>>>>>>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>>>>>>>> Documentation: http://www.balabit.com/support
>>>>>>>>>>> /documentation/?product=syslog-ng
>>>>>>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> __________________
>>>>>>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>>>>>>> Documentation: http://www.balabit.com/support
>>>>>>>>>> /documentation/?product=syslog-ng
>>>>>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> | Jose Angel Santiago
>>>>>>>>>
>>>>>>>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>>>>>>>
>>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>>>>
>>>>>>>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>>>>>>>
>>>>>>>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com
>>>>>>>>> <https://twitter.com/stratiobd>
>>>>>>>>> <https://www.linkedin.com/company/stratiobd>
>>>>>>>>> <https://www.youtube.com/c/StratioBD>
>>>>>>>>> ____________________________________________________________
>>>>>>>>> __________________
>>>>>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>>>>>> Documentation: http://www.balabit.com/support
>>>>>>>>> /documentation/?product=syslog-ng
>>>>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ____________________________________________________________
>>>>>>>> __________________
>>>>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>>>>> Documentation: http://www.balabit.com/support
>>>>>>>> /documentation/?product=syslog-ng
>>>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> | Jose Angel Santiago
>>>>>>>
>>>>>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>>>>>
>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>>
>>>>>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>>>>>
>>>>>>> +34 918 286 473 | www.stratio.com
>>>>>>> <https://twitter.com/stratiobd>
>>>>>>> <https://www.linkedin.com/company/stratiobd>
>>>>>>> <https://www.youtube.com/c/StratioBD>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> | Jose Angel Santiago
>>>>>>
>>>>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>>>>
>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>
>>>>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>>>>
>>>>>> +34 918 286 473 | www.stratio.com
>>>>>> <https://twitter.com/stratiobd>
>>>>>> <https://www.linkedin.com/company/stratiobd>
>>>>>> <https://www.youtube.com/c/StratioBD>
>>>>>>
>>>>>> ____________________________________________________________
>>>>>> __________________
>>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>>> Documentation: http://www.balabit.com/support
>>>>>> /documentation/?product=syslog-ng
>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> ____________________________________________________________
>>>>> __________________
>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>> Documentation: http://www.balabit.com/support
>>>>> /documentation/?product=syslog-ng
>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> | Jose Angel Santiago
>>>>
>>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>>
>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>
>>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>>
>>>> +34 918 286 473 | www.stratio.com
>>>> <https://twitter.com/stratiobd>
>>>> <https://www.linkedin.com/company/stratiobd>
>>>> <https://www.youtube.com/c/StratioBD>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> | Jose Angel Santiago
>>>
>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>
>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>
>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>
>>> +34 918 286 473 | www.stratio.com
>>> <https://twitter.com/stratiobd>
>>> <https://www.linkedin.com/company/stratiobd>
>>> <https://www.youtube.com/c/StratioBD>
>>>
>>> ____________________________________________________________
>>> __________________
>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>> Documentation: http://www.balabit.com/support
>>> /documentation/?product=syslog-ng
>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>
>>>
>>>
>>
>> ____________________________________________________________
>> __________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=
>> syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>
>>
>>
>
>
> --
>
> | Jose Angel Santiago
>
> [image: Logo_signature2.png] <http://www.stratio.com/>
>
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>
> 28224 Pozuelo de Alarcón, Madrid, Spain
>
> +34 918 286 473 | www.stratio.com
> <https://twitter.com/stratiobd>
> <https://www.linkedin.com/company/stratiobd>
> <https://www.youtube.com/c/StratioBD>
>
> ____________________________________________________________
> __________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?
> product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20180919/863a979e/attachment-0001.html>


More information about the syslog-ng mailing list