[syslog-ng] librdkafka and messages dropped

Attila Szakacs (aszakacs) Attila.Szakacs at oneidentity.com
Mon Jan 13 09:53:38 UTC 2020


Hi John,

Could you gather debug logs with `syslog-ng -Fedtv`? It would provide a nice starting point.

Thanks,
Attila
________________________________
From: syslog-ng <syslog-ng-bounces at lists.balabit.hu> on behalf of John Skopis <jspam at skopis.com>
Sent: Monday, January 13, 2020 10:11 AM
To: syslog-ng at lists.balabit.hu <syslog-ng at lists.balabit.hu>
Subject: [syslog-ng] librdkafka and messages dropped

CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.

Hello,

We have a syslog-ng aggregator that accepts messages in a few different formats and logs the message format to kafka. The mapping looks something like:

format1 -> topic1
format2 -> topic2
...

Each agent will emit messages in ~4 different formats and send to the aggregator.
The aggregator is configured with a kafka_c destination with 8 threads/destination.

We were previously using the java kafka client without issue.
We recently upgraded to 3.24.1 + librdkafka 1.22

After upgrading we are seeing an issue where (seemingly randomly) syslog-ng stops producing messages into kafka.

Checking syslog-ng-ctl stats I can see
d_kafka_format1 has dropped message counter increasing
d_kafka_format1 queued messages counter increases to 80000 (10000*8 threads iw-size)
d_kafka_format1 processed messages does not increase

There are messages printed to the log that the destination queue is full for all of the rdkafka threads,

There are spurious failures to a kafka broker but it seems rdkafka reconnects.

Seems like once the destination queue becomes full syslog-ng never recovers and the instance must be restarted.
It also seems like rdkafka stops trying to reconnect to kafka (possibly?).

Is this expected behavior or is there some bug around reconnecting to kafka after a spurious timeout?
Just speculating here but does rdkafka stop trying to reconnect after N times?
Are there any examples of destroy/create destination client after failure thresholds?

Thanks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20200113/be9e08d0/attachment.html>


More information about the syslog-ng mailing list