[syslog-ng] Destination with disk-buffer and reliable(yes) looses queued messages on restart/crash?

Mon Mar 22 11:13:03 UTC 2021

Hello,

I am probably missing something here: I am trying to configure syslog-ng (3.27.1) running in a Kubernetes Pod, using a Persistent Volume mounted at /var/log, such that queued messages are spooled to disk and in the event of a crash of syslog-ng the queue can be recovered. I configured the destination like so:

destination d_forwarder {
    syslog(
        "`HEAVY_FORWARDER_HOST`" port(`HEAVY_FORWARDER_PORT`)
        transport("tls")
        tls(
            ca-dir("/etc/syslog-ng/ca.d")
            key-file("/vault/secrets/client_key.pem")
            cert-file("/vault/secrets/client_cert.pem")
            peer-verify(required-trusted)
        )
        disk-buffer(
            mem-buf-size(524288)
            disk-buf-size(104857600)
            reliable(yes)
            dir("/var/log")
        )
    );
};

The documentation for the `reliable` flag says: "If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash."
I can see several *.rqf files being created in /var/log. As soon as the latest of them reaches roughly 100MB messages start to get dropped. So far everything as expected.

Stats:
{
   "center_queued_processed": 72031,
   "center_received_processed": 36016,
   "destination_d_forwarder_processed": 36015,
   "destination_d_local_processed": 36016,
   "dst_syslog_d_forwarder_0_tls_heavy-forwarder_shared-services_svc_cluster_local_6514_dropped": 4173,
   "dst_syslog_d_forwarder_0_tls_heavy-forwarder_shared-services_svc_cluster_local_6514_processed": 36015,
   "dst_syslog_d_forwarder_0_tls_heavy-forwarder_shared-services_svc_cluster_local_6514_queued": 31842,
   "dst_syslog_d_forwarder_0_tls_heavy-forwarder_shared-services_svc_cluster_local_6514_written": 0,
   ...
   "source_s_external_tls_processed": 36015
   ...
}

/var/log:
I have no name!@syslog-ng-76f898f5bb-sh9q8:/var/log$ ls -lh
total 213M
-rw------- 1 10001 10001  38K Mar 22 08:33 syslog-ng-00000.rqf
-rw------- 1 10001 10001 4.0K Mar 22 09:25 syslog-ng-00001.rqf
-rw------- 1 10001 10001 101M Mar 22 09:29 syslog-ng-00002.rqf
-rw------- 1 10001 10001 4.0K Mar 22 09:46 syslog-ng-00003.rqf
-rw------- 1 10001 10001 4.0K Mar 22 10:14 syslog-ng-00004.rqf
...

Now, if I kill the syslog-ng Pod or gracefully scale the deployment to 0 and back up to 1, the queue is still lost. The stats all go back to 0 and bringing up the destination shows no (queued) messages coming in.  On every restart a new .rqf gets created. New messages get spooled to the latest .rqf file until that one reaches the configured 100Mb size limit as well.

What am I missing here?

Thanks in advance!
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20210322/6d285992/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5837 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20210322/6d285992/attachment.bin>