Hi, try to increase the log-iw-size (its default value is 100) on the relay. (it's a source option, so add to your source log-iw-size(100000) or something like similar). L. On Wed, Sep 19, 2018 at 3:50 PM, Jose Angel Santiago <jasantiago@stratio.com
wrote:
Hi,
You're right, what I saw on my tcpdump file was TCP ACK.
I guess I lose so many messages because the TCP buffer is getting bigger and bigger since syslog-relay queue_length from disk_buffer reaches its limit. In fact, when this queue is not full (i.e with very low msg/sec ratio) I lose from 0-5 messages according to my tests.
Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.
Thanks.
2018-09-19 13:46 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com>:
Hi,
what do you mean under 'syslog-relay returns ACK to syslog-ng agent' ? I guess this is TCP level ACK.
And TCP level ACK is not the same as what syslog-ng uses for controlling the window size of the LogSources.
It is possible that you have pending data in TCP buffers (maintained by the kernel) that is not read by syslog-ng, but they are already ACKd by TCP itself...
(Solving this kind of situation requires application level ACK, which syslog-ng has only in the commercial edition.)
L.
On Wed, Sep 19, 2018 at 9:23 AM, Jose Angel Santiago < jasantiago@stratio.com> wrote:
Hi,
I've realized that the queue_length of the disk_buffer never goes beyond 99, no matter the ratio message/sec logger process is writing or the value of mem-buf-size or disk-buf-size.
Which parameter is the one which would match the queue_length of the disk_buffer? It seems that it should be *mem-buf-length()* but I'm using reliable=yes so I shouldn't use that parameter, and I'd like to test my scenary being able to queue more than 99 messages on disk_buffer.
Regards.
2018-09-17 14:33 GMT+02:00 Jose Angel Santiago <jasantiago@stratio.com>:
3.17.2
Thanks in advance.
2018-09-17 14:31 GMT+02:00 Budai, László <laszlo.budai@oneidentity.com> :
Hi,
I'll try to reproduce this issue in my test environment. What syslog-ng version do you have?
L.
On Mon, Sep 17, 2018 at 2:04 PM, Jose Angel Santiago < jasantiago@stratio.com> wrote:
Hi,
After repeating the test without TLS between syslog-agent & syslog-relay, with 2229 messages logged with a 20 msgs/sec logging ratio, in debug mode and with two tcpdumps running, these are the conclussions:
- Syslog-agent sends every single message (checked with tcpdump file and syslog-agent log file) - Syslog-relay returns ACK for every single message, but sometimes the package contains more than one message (checked with tcpdump file) - Checking the syslog-relay log after killing & starting the docker container, I found the trace 'Reliable disk-buffer state loaded; filename='/syslog-ng-00000.rqf', queue_length='99', size='173844' and the queued messages (from msg 236 to msg 334 are processed). The next message processed is the nº 440. and it comes from the active network destination again. - Once all logs have been processed, in elasticsearch (syslog-relay destination) I can find messages from 1 to 334, and messages from 440 to 2229.
How is it possible that syslog-relay returns ACK for messages 335 to 439 to syslog-agent, but they are not in the queue, nor in the .rqf file? Where did these messages go? I know they are sent before starting to fill syslog-agent disk-buffer file (checked with rqf file from syslog-agent when I kill the syslog-relay container). It seems that there's some kind of race condition that makes those messages are not processed/queued while syslog-relay is being killed.
Mi procedure consists of:
. Start tcpdump on both syslog-agent & syslog-relay hosts - Start logger process - Once logs are being inserted in elasticsearch, I kill the syslog-relay docker container, wait about 10 seconds, and run a new container (using a mapped volume where .rqf file and .persist file are) - Once logs are being inserted in elasticsearch again, I stop the logger process. - I wait until no log is left to be processed
Would you test anything else? I'm running out of ideas.
PD: Same test with logger process writing 5 msgs/sec ratio produces from 0 to 5 lost messages.
Regards.
2018-09-14 14:05 GMT+02:00 Jose Angel Santiago < jasantiago@stratio.com>:
> Hi, > > I've got more accurate information about where are my lost messages. > > Now I'm using disk-buffer in syslog-agent and syslog-relay, and I've > checked that lost messages are the ones sent by the syslog-agent when the > syslog-relay docker container is being killed. I can see those messages on > syslog-agent log (I've got both agent & relay in debug mode) with its > corresponding "Outgoing message" line, but those messages never reach the > relay. > > Could it be that the relay docker container still returns ACK to the > agent (the agent resolves relay fqdn with a custom DNS) while syslog-ng > process within the container is being stopped? I'm about to test again > using tcpdump to confirm this theory, > > BTW, disk-buffers works ok, sometimes I get some duplicated messages > when restarting the relay but that's not a problem for me. Forget about my > .persist file re-creation theory, it doesn't happen. > > Regards. > > > > 2018-09-13 18:07 GMT+02:00 Péter, Kókai <peter.kokai@oneidentity.com > >: > >> Hello, >> >> It would not make sense to replace the persist file after restart, >> so it is not something that syslog-ng does. Only if that file is corrupted, >> in that case at startup there should be a log about it, have you checked >> the syslog-ng logs ? (it would be better to enable debug and/or verbose >> logs, and if possible share it with us.) >> >> Could you reproduce the same behavior without docker (if possible) ? >> >> >> >> Best Regards, >> Peter Kokai >> >> On Thu, Sep 13, 2018 at 4:44 PM Jose Angel Santiago < >> jasantiago@stratio.com> wrote: >> >>> Hi, >>> >>> I guess I know what is happening, when I start from scratch the >>> docker container, even I provide a persist file and a buffer file within >>> the mapped volume, syslog-ng recreates them so all messages in buffer file >>> which were not processed by the relay are lost. >>> >>> Is there any way to tell syslog-ng to use an already existing >>> .persist file so it doesn't recreate the .rqf file? >>> >>> Regards. >>> >>> >>> >>> 2018-09-13 16:23 GMT+02:00 Budai, László < >>> laszlo.budai@oneidentity.com>: >>> >>>> Hi, >>>> >>>> one problem could be if the flush-limit would be greater than >>>> 1... in that case syslog-ng would use a HttpBulkMessageProcessor. >>>> In this case syslog-ng pass the message to the >>>> HttpBulkMessageProcessor and sends back a positive ACK to the LogSource (so >>>> the message is removed from the diskbuffer), and if the dockerimage is >>>> killed, all the messages stored in the HttpBulkMessageProcessor are lost. >>>> But in your case syslog-ng should use the >>>> HttpSingleMessageProcessor... which means that the messages are sent >>>> one-by-one... >>>> Could you check the diskbuffer with the dqtool? >>>> >>>> >>>> L. >>>> >>>> On Thu, Sep 13, 2018 at 3:50 PM, Jose Angel Santiago < >>>> jasantiago@stratio.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm running syslog-ng (with an elasticsearch2 destination >>>>> configured) within a docker container, and I'm trying to avoid loss of >>>>> messages if I kill the docker container and I start it again. >>>>> >>>>> This is my scenary: >>>>> >>>>> - A service which produces 20 lines of log per second >>>>> - A sislog-ng instance reading from a wildcard-file source (but >>>>> actually it only reads logs from the above service, let's call it >>>>> syslog-agent), which sends all logs to another syslog-ng instance (the one >>>>> running in a docker container, let's call it syslog-relay) though a network >>>>> destination. >>>>> - The syslog-relay sends messages to an elasticsearch instance, >>>>> with following configuration: >>>>> >>>>> options { >>>>> chain-hostnames(no); >>>>> use-dns(no); >>>>> keep-hostname(yes); >>>>> owner("syslog-ng"); >>>>> group("stratio"); >>>>> perm(0640); >>>>> time-reap(30); >>>>> mark-freq(10); >>>>> stats-freq(0); >>>>> bad-hostname("^gconfd$"); >>>>> flush-lines(100); >>>>> log-fifo-size(1000); >>>>> }; >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *destination d_elastic_default_0 { elasticsearch2( >>>>> cluster("myelastic") cluster-url("https://myelastic.logs:9200 >>>>> <https://myelastic.logs:9200>") client_mode("https") >>>>> index("default") type("log") flush-limit(1) >>>>> disk-buffer( mem-buf-size(16M) >>>>> disk-buf-size(16M) reliable(yes) >>>>> dir("/syslog-ng/log") ) http-auth-type("clientcert") >>>>> java-keystore-filepath("/etc/syslog-ng/certificates/syslog-relay.jks") >>>>> java-keystore-password("XXXXXX") >>>>> java-truststore-filepath("/etc/syslog-ng/certificates/ca-bundle.jks") >>>>> java-truststore-password("XXXXXXXXXX") );};* >>>>> >>>>> - The dir "/syslog-ng/log" is mapped to a path "/tmp/buffer" >>>>> from the host where the docker container is running, so when I kill the >>>>> docker container, the buffer file is not lost. >>>>> - I've set flush-limit to 1 because I thought that I may lost 1 >>>>> message only as much. >>>>> >>>>> This architecture is working fine (flush-limit=1 makes very >>>>> slow, but for this test is ok), but if I kill the syslog-relay docker >>>>> container, wait 5 to 10 seconds and start it again from scratch, I can see >>>>> that several hundreds of logs are missing in elasticsearch. I check it by >>>>> stopping the logger service and letting syslog-ng agent & relay to finish >>>>> the process enqueued messages. >>>>> >>>>> I can see in the syslog-agent stats that all logs messages have >>>>> been processed, so it seems the problem is on the syslog-relay. >>>>> >>>>> Is this behaviour expected? If so, how can I protect against >>>>> loss of messages in case of a syslog-relay docker container unexpected kill? >>>>> >>>>> Thanks in advance. >>>>> >>>>> >>>>> -- >>>>> >>>>> | Jose Angel Santiago >>>>> >>>>> [image: Logo_signature2.png] <http://www.stratio.com/> >>>>> >>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>>> >>>>> 28224 Pozuelo de Alarcón, Madrid, Spain >>>>> >>>>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>>>> <https://twitter.com/stratiobd> >>>>> <https://www.linkedin.com/company/stratiobd> >>>>> <https://www.youtube.com/c/StratioBD> >>>>> >>>>> ____________________________________________________________ >>>>> __________________ >>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>>> Documentation: http://www.balabit.com/support >>>>> /documentation/?product=syslog-ng >>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>>> >>>>> >>>>> >>>> >>>> ____________________________________________________________ >>>> __________________ >>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>>> Documentation: http://www.balabit.com/support >>>> /documentation/?product=syslog-ng >>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>>> >>>> >>>> >>> >>> >>> -- >>> >>> | Jose Angel Santiago >>> >>> [image: Logo_signature2.png] <http://www.stratio.com/> >>> >>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>> >>> 28224 Pozuelo de Alarcón, Madrid, Spain >>> >>> +34 918 286 473 <+34%20918%2028%2064%2073> | www.stratio.com >>> <https://twitter.com/stratiobd> >>> <https://www.linkedin.com/company/stratiobd> >>> <https://www.youtube.com/c/StratioBD> >>> ____________________________________________________________ >>> __________________ >>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >>> Documentation: http://www.balabit.com/support >>> /documentation/?product=syslog-ng >>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >>> >>> >> ____________________________________________________________ >> __________________ >> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng >> Documentation: http://www.balabit.com/support >> /documentation/?product=syslog-ng >> FAQ: http://www.balabit.com/wiki/syslog-ng-faq >> >> >> > > > -- > > | Jose Angel Santiago > > [image: Logo_signature2.png] <http://www.stratio.com/> > > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > > 28224 Pozuelo de Alarcón, Madrid, Spain > > +34 918 286 473 | www.stratio.com > <https://twitter.com/stratiobd> > <https://www.linkedin.com/company/stratiobd> > <https://www.youtube.com/c/StratioBD> >
--
| Jose Angel Santiago
[image: Logo_signature2.png] <http://www.stratio.com/>
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid, Spain
+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
--
| Jose Angel Santiago
[image: Logo_signature2.png] <http://www.stratio.com/>
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid, Spain
+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>
--
| Jose Angel Santiago
[image: Logo_signature2.png] <http://www.stratio.com/>
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid, Spain
+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
--
| Jose Angel Santiago
[image: Logo_signature2.png] <http://www.stratio.com/>
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid, Spain
+34 918 286 473 | www.stratio.com <https://twitter.com/stratiobd> <https://www.linkedin.com/company/stratiobd> <https://www.youtube.com/c/StratioBD>
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq