[syslog-ng] librdkafka and messages dropped

John Skopis jspam at skopis.com
Tue Jan 14 13:16:57 UTC 2020


I think I can explain the behavior I am observing.

rdkafka has a buffer (queue.buffering.max.messages). I guess this buffer is
drained by rd_kafka_poll, which triggers delivery callback.
When kafka has a failure (spurious or otherwise) _kafka_delivery_report_cb
reinjects the message into the queue. The amount of messages reinjected can
be very large, causing the queue to fill much faster than input log rate
(depends on kafka timeout and max inflight requests I guess).
When the queue is full no further messages are inserted into the queue,
i.e. syslog-ng drop counter increments and queue is stuck at max fifo size.
rd_kafka_poll is only called by _drain_responses, which is scheduled when
messages are inserted into the queue.
Messages can't be inserted to the queue because the queue is full; Messages
can't be drained from the rdkafka buffer because it's not possible to add
new messages, and rd_kafka_poll is only scheduled on insert.


You can reproduce by setting queue.buffering.max.messages to 30 and
log-fifo-size to 10. Stop kafka and log messages until the fifo fills up.
All messages are dropped after this point. When you restart kafka rdkafka
produce into kafka but syslog-ng never gets the delivery_cb because
rd_kafka_poll isn't called.

I am not familiar with ivkis (or syslog-ng tbh) codebase but it seems the
issue is that the timer is not scheduled because the worker is not
scheduled. The only way I can see to fix is in the attached patch -- move
the callback out of the worker and into mainloop. I know it's probably not
ideal but it does seem to fix my issue.

I did try upgrading our ivykis version to 0.42.4 and it didn't seem to help.

Thanks,
John

On Mon, Jan 13, 2020 at 9:53 AM Attila Szakacs (aszakacs) <
Attila.Szakacs at oneidentity.com> wrote:

> Hi John,
>
> Could you gather debug logs with `syslog-ng -Fedtv`? It would provide a
> nice starting point.
>
> Thanks,
> Attila
> ------------------------------
> *From:* syslog-ng <syslog-ng-bounces at lists.balabit.hu> on behalf of John
> Skopis <jspam at skopis.com>
> *Sent:* Monday, January 13, 2020 10:11 AM
> *To:* syslog-ng at lists.balabit.hu <syslog-ng at lists.balabit.hu>
> *Subject:* [syslog-ng] librdkafka and messages dropped
>
> CAUTION: This email originated from outside of the organization. Do not
> follow guidance, click links, or open attachments unless you recognize the
> sender and know the content is safe.
>
> Hello,
>
> We have a syslog-ng aggregator that accepts messages in a few different
> formats and logs the message format to kafka. The mapping looks something
> like:
>
> format1 -> topic1
> format2 -> topic2
> ...
>
> Each agent will emit messages in ~4 different formats and send to the
> aggregator.
> The aggregator is configured with a kafka_c destination with 8
> threads/destination.
>
> We were previously using the java kafka client without issue.
> We recently upgraded to 3.24.1 + librdkafka 1.22
>
> After upgrading we are seeing an issue where (seemingly randomly)
> syslog-ng stops producing messages into kafka.
>
> Checking syslog-ng-ctl stats I can see
> d_kafka_format1 has dropped message counter increasing
> d_kafka_format1 queued messages counter increases to 80000 (10000*8
> threads iw-size)
> d_kafka_format1 processed messages does not increase
>
> There are messages printed to the log that the destination queue is full
> for all of the rdkafka threads,
>
> There are spurious failures to a kafka broker but it seems rdkafka
> reconnects.
>
> Seems like once the destination queue becomes full syslog-ng never
> recovers and the instance must be restarted.
> It also seems like rdkafka stops trying to reconnect to kafka (possibly?).
>
> Is this expected behavior or is there some bug around reconnecting to
> kafka after a spurious timeout?
> Just speculating here but does rdkafka stop trying to reconnect after N
> times?
> Are there any examples of destroy/create destination client after failure
> thresholds?
>
> Thanks
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20200114/2be25480/attachment.html>


More information about the syslog-ng mailing list