[syslog-ng] disk-buffer in elasticsearch2 destination loses messages if docker container is killed

Budai, László laszlo.budai at oneidentity.com
Wed Sep 19 15:09:14 UTC 2018


Hi,

try to increase the log-iw-size (its default value is 100) on the relay.
(it's a source option, so add to your source log-iw-size(100000) or
something like similar).

L.

On Wed, Sep 19, 2018 at 3:50 PM, Jose Angel Santiago <jasantiago at stratio.com
> wrote:

> Hi,
>
> You're right, what I saw on my tcpdump file was TCP ACK.
>
> I guess I lose so many messages because the TCP buffer is getting bigger
> and bigger since syslog-relay queue_length from disk_buffer reaches its
> limit. In fact, when this queue is not full (i.e with very low msg/sec
> ratio) I lose from 0-5 messages according to my tests.
>
> Which parameter is the one which would match the queue_length of the
> disk_buffer? It seems that it should be *mem-buf-length()* but I'm using
> reliable=yes so I shouldn't use that parameter, and I'd like to test my
> scenary being able to queue more than 99 messages on disk_buffer.
>
> Thanks.
>
> 2018-09-19 13:46 GMT+02:00 Budai, László <laszlo.budai at oneidentity.com>:
>
>> Hi,
>>
>> what do you mean under 'syslog-relay returns ACK to syslog-ng agent' ?
>> I guess this is TCP level ACK.
>>
>> And TCP level ACK is not the same as what syslog-ng uses for controlling
>> the window size of the LogSources.
>>
>> It is possible that you have pending data in TCP buffers (maintained by
>> the kernel) that is not read by syslog-ng, but they are already ACKd by TCP
>> itself...
>>
>> (Solving this kind of situation requires application level ACK, which
>> syslog-ng has only in the commercial edition.)
>>
>> L.
>>
>>
>> On Wed, Sep 19, 2018 at 9:23 AM, Jose Angel Santiago <
>> jasantiago at stratio.com> wrote:
>>
>>> Hi,
>>>
>>> I've realized that the queue_length of the disk_buffer never goes beyond
>>> 99, no matter the ratio message/sec logger process is writing or the value
>>> of mem-buf-size or disk-buf-size.
>>>
>>> Which parameter is the one which would match the queue_length of the
>>> disk_buffer? It seems that it should be *mem-buf-length()* but I'm
>>> using reliable=yes so I shouldn't use that parameter, and I'd like to test
>>> my scenary being able to queue more than 99 messages on disk_buffer.
>>>
>>> Regards.
>>>
>>> 2018-09-17 14:33 GMT+02:00 Jose Angel Santiago <jasantiago at stratio.com>:
>>>
>>>> 3.17.2
>>>>
>>>> Thanks in advance.
>>>>
>>>> 2018-09-17 14:31 GMT+02:00 Budai, László <laszlo.budai at oneidentity.com>
>>>> :
>>>>
>>>>> Hi,
>>>>>
>>>>> I'll try to reproduce this issue in my test environment.
>>>>> What syslog-ng version do you have?
>>>>>
>>>>> L.
>>>>>
>>>>> On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago <
>>>>> jasantiago at stratio.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> After repeating the test without TLS between syslog-agent &
>>>>>> syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio,
>>>>>> in debug mode and with two tcpdumps running, these are the conclussions:
>>>>>>
>>>>>> - Syslog-agent sends every single message (checked with tcpdump file
>>>>>> and syslog-agent log file)
>>>>>> - Syslog-relay returns ACK for every single message, but sometimes
>>>>>> the package contains more than one message (checked with tcpdump file)
>>>>>> - Checking the syslog-relay log after killing & starting the docker
>>>>>> container, I found the trace 'Reliable disk-buffer state loaded;
>>>>>> filename='/syslog-ng-00000.rqf', queue_length='99', size='173844'
>>>>>> and the queued messages (from msg 236 to msg 334 are processed). The next
>>>>>> message processed is the nº 440. and it comes from the active network
>>>>>> destination again.
>>>>>> - Once all logs have been processed, in elasticsearch (syslog-relay
>>>>>> destination) I can find messages from 1 to 334, and messages from 440 to
>>>>>> 2229.
>>>>>>
>>>>>> How is it possible that syslog-relay returns ACK for messages 335 to
>>>>>> 439 to syslog-agent, but they are not in the queue, nor in the .rqf file?
>>>>>> Where did these messages go? I know they are sent before starting to fill
>>>>>> syslog-agent disk-buffer file (checked with rqf file from syslog-agent when
>>>>>> I kill the syslog-relay container). It seems that there's some kind of race
>>>>>> condition that makes those messages are not processed/queued  while
>>>>>> syslog-relay is being killed.
>>>>>>
>>>>>> Mi procedure consists of:
>>>>>>
>>>>>> . Start tcpdump on both syslog-agent & syslog-relay hosts
>>>>>> - Start logger process
>>>>>> - Once logs are being inserted in elasticsearch, I kill the
>>>>>> syslog-relay docker container, wait about 10 seconds, and run a new
>>>>>> container (using a mapped volume where .rqf file and .persist file are)
>>>>>> - Once logs are being inserted in elasticsearch again, I stop the
>>>>>> logger process.
>>>>>> - I wait until no log is left to be processed
>>>>>>
>>>>>> Would you test anything else? I'm running out of ideas.
>>>>>>
>>>>>> PD: Same test with logger process writing 5 msgs/sec ratio produces
>>>>>> from 0 to 5 lost messages.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2018-09-14 14:05 GMT+02:00 Jose Angel Santiago <
>>>>>> jasantiago at stratio.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've got more accurate information about where are my lost messages.
>>>>>>>
>>>>>>> Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've
>>>>>>> checked that lost messages are the ones sent by the syslog-agent when the
>>>>>>> syslog-relay docker container is being killed. I can see those messages on
>>>>>>> syslog-agent log (I've got both agent & relay in debug mode) with its
>>>>>>> corresponding "Outgoing message" line, but those messages never reach the
>>>>>>> relay.
>>>>>>>
>>>>>>> Could it be that the relay docker container still returns ACK to the
>>>>>>> agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng
>>>>>>> process within the container is being stopped? I'm about to test again
>>>>>>> using tcpdump to confirm this theory,
>>>>>>>
>>>>>>> BTW, disk-buffers works ok, sometimes I get some duplicated messages
>>>>>>> when restarting the relay but that's not a problem for me. Forget about my
>>>>>>> .persist file re-creation theory, it doesn't happen.
>>>>>>>
>>>>>>> Regards.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai at oneidentity.com
>>>>>>> >:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> It would not make sense to replace the persist file after restart,
>>>>>>>> so it is not something that syslog-ng does. Only if that file is corrupted,
>>>>>>>> in that case at startup there should be a log about it, have you checked
>>>>>>>> the syslog-ng logs ? (it would be better to enable debug and/or verbose
>>>>>>>> logs, and if possible share it with us.)
>>>>>>>>
>>>>>>>> Could you reproduce the same behavior without docker (if possible) ?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Peter Kokai
>>>>>>>>
>>>>>>>> On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago <
>>>>>>>> jasantiago at stratio.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I guess I know what is happening, when I start from scratch the
>>>>>>>>> docker container, even I provide a persist file and a buffer file within
>>>>>>>>> the mapped volume, syslog-ng recreates them so all messages in buffer file
>>>>>>>>> which were not processed by the relay are lost.
>>>>>>>>>
>>>>>>>>> Is there any way to tell syslog-ng to use an already existing
>>>>>>>>> .persist file so it doesn't recreate the .rqf file?
>>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2018-09-13 16:23 GMT+02:00 Budai, László <
>>>>>>>>> laszlo.budai at oneidentity.com>:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> one problem could be if the flush-limit would be greater than
>>>>>>>>>> 1... in that case syslog-ng would use a HttpBulkMessageProcessor.
>>>>>>>>>> In this case syslog-ng pass the message to the
>>>>>>>>>> HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so
>>>>>>>>>> the message is removed from the diskbuffer), and if the dockerimage is
>>>>>>>>>> killed, all the messages stored in the HttpBulkMessageProcessor are lost.
>>>>>>>>>> But in your case syslog-ng should use the
>>>>>>>>>> HttpSingleMessageProcessor... which means that the messages are sent
>>>>>>>>>> one-by-one...
>>>>>>>>>> Could you check the diskbuffer with the dqtool?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> L.
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago <
>>>>>>>>>> jasantiago at stratio.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I'm running syslog-ng (with an elasticsearch2 destination
>>>>>>>>>>> configured) within a docker container, and I'm trying to avoid loss of
>>>>>>>>>>> messages if I kill the docker container and I start it again.
>>>>>>>>>>>
>>>>>>>>>>> This is my scenary:
>>>>>>>>>>>
>>>>>>>>>>> - A service which produces 20 lines of log per second
>>>>>>>>>>> - A sislog-ng instance reading from a wildcard-file source (but
>>>>>>>>>>> actually it only reads logs from the above service, let's call it
>>>>>>>>>>> syslog-agent), which sends all logs to another syslog-ng instance (the one
>>>>>>>>>>> running in a docker container, let's call it syslog-relay) though a network
>>>>>>>>>>> destination.
>>>>>>>>>>> - The syslog-relay sends messages to an elasticsearch instance,
>>>>>>>>>>> with following configuration:
>>>>>>>>>>>
>>>>>>>>>>> options {
>>>>>>>>>>>     chain-hostnames(no);
>>>>>>>>>>>     use-dns(no);
>>>>>>>>>>>     keep-hostname(yes);
>>>>>>>>>>>     owner("syslog-ng");
>>>>>>>>>>>     group("stratio");
>>>>>>>>>>>     perm(0640);
>>>>>>>>>>>     time-reap(30);
>>>>>>>>>>>     mark-freq(10);
>>>>>>>>>>>     stats-freq(0);
>>>>>>>>>>>     bad-hostname("^gconfd$");
>>>>>>>>>>>     flush-lines(100);
>>>>>>>>>>>     log-fifo-size(1000);
>>>>>>>>>>>     };
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *destination d_elastic_default_0 {    elasticsearch2(
>>>>>>>>>>> cluster("myelastic")        cluster-url("https://myelastic.logs:9200
>>>>>>>>>>> <https://myelastic.logs:9200>")        client_mode("https")
>>>>>>>>>>> index("default")        type("log")        flush-limit(1)
>>>>>>>>>>> disk-buffer(            mem-buf-size(16M)
>>>>>>>>>>> disk-buf-size(16M)            reliable(yes)
>>>>>>>>>>> dir("/syslog-ng/log")        )        http-auth-type("clientcert")
>>>>>>>>>>> java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks")
>>>>>>>>>>> java-keystore-password("XXXXXX")
>>>>>>>>>>> java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks")
>>>>>>>>>>> java-truststore-password("XXXXXXXXXX")    );};*
>>>>>>>>>>>
>>>>>>>>>>> - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer"
>>>>>>>>>>> from the host where the docker container is running, so when I kill the
>>>>>>>>>>> docker container, the buffer file is not lost.
>>>>>>>>>>> - I've set flush-limit to 1 because I thought that I may lost 1
>>>>>>>>>>> message only as much.
>>>>>>>>>>>
>>>>>>>>>>> This architecture is working fine (flush-limit=1 makes very
>>>>>>>>>>> slow, but for this test is ok), but if I kill the syslog-relay docker
>>>>>>>>>>> container, wait 5 to 10 seconds and start it again from scratch, I can see
>>>>>>>>>>> that several hundreds of logs are missing in elasticsearch. I check it by
>>>>>>>>>>> stopping the logger service and letting syslog-ng agent & relay to finish
>>>>>>>>>>> the process enqueued messages.
>>>>>>>>>>>
>>>>>>>>>>> I can see in the syslog-agent stats that all logs messages have
>>>>>>>>>>> been processed, so it seems the problem is on the syslog-relay.
>>>>>>>>>>>
>>>>>>>>>>> Is this behaviour expected? If so, how can I protect against
>>>>>>>>>>> loss of messages in case of a syslog-relay docker container unexpected kill?
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> | Jose Angel Santiago
>>>>>>>>>>>
>>>>>>>>>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>>>>>>>>>
>>>>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>>>>>>
>>>>>>>>>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>>>>>>>>>
>>>>>>>>>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com
>>>>>>>>>>> <https://twitter.com/stratiobd>
>>>>>>>>>>> <https://www.linkedin.com/company/stratiobd>
>>>>>>>>>>> <https://www.youtube.com/c/StratioBD>
>>>>>>>>>>>
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> __________________
>>>>>>>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>>>>>>>> Documentation: http://www.balabit.com/support
>>>>>>>>>>> /documentation/?product=syslog-ng
>>>>>>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> __________________
>>>>>>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>>>>>>> Documentation: http://www.balabit.com/support
>>>>>>>>>> /documentation/?product=syslog-ng
>>>>>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> | Jose Angel Santiago
>>>>>>>>>
>>>>>>>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>>>>>>>
>>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>>>>
>>>>>>>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>>>>>>>
>>>>>>>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com
>>>>>>>>> <https://twitter.com/stratiobd>
>>>>>>>>> <https://www.linkedin.com/company/stratiobd>
>>>>>>>>> <https://www.youtube.com/c/StratioBD>
>>>>>>>>> ____________________________________________________________
>>>>>>>>> __________________
>>>>>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>>>>>> Documentation: http://www.balabit.com/support
>>>>>>>>> /documentation/?product=syslog-ng
>>>>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ____________________________________________________________
>>>>>>>> __________________
>>>>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>>>>> Documentation: http://www.balabit.com/support
>>>>>>>> /documentation/?product=syslog-ng
>>>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> | Jose Angel Santiago
>>>>>>>
>>>>>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>>>>>
>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>>
>>>>>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>>>>>
>>>>>>> +34 918 286 473 | www.stratio.com
>>>>>>> <https://twitter.com/stratiobd>
>>>>>>> <https://www.linkedin.com/company/stratiobd>
>>>>>>> <https://www.youtube.com/c/StratioBD>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> | Jose Angel Santiago
>>>>>>
>>>>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>>>>
>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>
>>>>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>>>>
>>>>>> +34 918 286 473 | www.stratio.com
>>>>>> <https://twitter.com/stratiobd>
>>>>>> <https://www.linkedin.com/company/stratiobd>
>>>>>> <https://www.youtube.com/c/StratioBD>
>>>>>>
>>>>>> ____________________________________________________________
>>>>>> __________________
>>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>>> Documentation: http://www.balabit.com/support
>>>>>> /documentation/?product=syslog-ng
>>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> ____________________________________________________________
>>>>> __________________
>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>> Documentation: http://www.balabit.com/support
>>>>> /documentation/?product=syslog-ng
>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> | Jose Angel Santiago
>>>>
>>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>>
>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>
>>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>>
>>>> +34 918 286 473 | www.stratio.com
>>>> <https://twitter.com/stratiobd>
>>>> <https://www.linkedin.com/company/stratiobd>
>>>> <https://www.youtube.com/c/StratioBD>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> | Jose Angel Santiago
>>>
>>> [image: Logo_signature2.png] <http://www.stratio.com/>
>>>
>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>
>>> 28224 Pozuelo de Alarcón, Madrid, Spain
>>>
>>> +34 918 286 473 | www.stratio.com
>>> <https://twitter.com/stratiobd>
>>> <https://www.linkedin.com/company/stratiobd>
>>> <https://www.youtube.com/c/StratioBD>
>>>
>>> ____________________________________________________________
>>> __________________
>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>> Documentation: http://www.balabit.com/support
>>> /documentation/?product=syslog-ng
>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>
>>>
>>>
>>
>> ____________________________________________________________
>> __________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support
>> /documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>
>>
>>
>
>
> --
>
> | Jose Angel Santiago
>
> [image: Logo_signature2.png] <http://www.stratio.com/>
>
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>
> 28224 Pozuelo de Alarcón, Madrid, Spain
>
> +34 918 286 473 | www.stratio.com
> <https://twitter.com/stratiobd>
> <https://www.linkedin.com/company/stratiobd>
> <https://www.youtube.com/c/StratioBD>
>
> ____________________________________________________________
> __________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=
> syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20180919/0b2d6a9a/attachment-0001.html>


More information about the syslog-ng mailing list