[syslog-ng] Destination with disk-buffer and reliable(yes) looses queued messages on restart/crash?

Mon Mar 22 11:31:17 UTC 2021

Hello,

Saving the queue file alone won't help you as you experienced that. There is also a persist file, the persist file contains a connection between a destination and its destination queue.

A destination if diskq configured first checks the persist file, if there is an entry that says which diskq file it should use.
If there is no entry in the persist file, it simply creates a new diskq file and files the entry.
So if the persist file is not kept, syslog-ng just going to create a new diskq file.

--
kokan

________________________________________
From: syslog-ng <syslog-ng-bounces at lists.balabit.hu> on behalf of Ralf.Steppacher at swisscom.com <Ralf.Steppacher at swisscom.com>
Sent: 22 March 2021 12:13
To: syslog-ng at lists.balabit.hu
Subject: [syslog-ng] Destination with disk-buffer and reliable(yes) looses queued messages on restart/crash?

CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.

Hello,

I am probably missing something here: I am trying to configure syslog-ng (3.27.1) running in a Kubernetes Pod, using a Persistent Volume mounted at /var/log, such that queued messages are spooled to disk and in the event of a crash of syslog-ng the queue can be recovered. I configured the destination like so:

destination d_forwarder {
    syslog(
        "`HEAVY_FORWARDER_HOST`" port(`HEAVY_FORWARDER_PORT`)
        transport("tls")
        tls(
            ca-dir("/etc/syslog-ng/ca.d")
            key-file("/vault/secrets/client_key.pem")
            cert-file("/vault/secrets/client_cert.pem")
            peer-verify(required-trusted)
        )
        disk-buffer(
            mem-buf-size(524288)
            disk-buf-size(104857600)
            reliable(yes)
            dir("/var/log")
        )
    );
};

The documentation for the `reliable` flag says: "If set to yes, syslog-ng OSE cannot lose logs in case of reload/restart, unreachable destination or syslog-ng OSE crash."
I can see several *.rqf files being created in /var/log. As soon as the latest of them reaches roughly 100MB messages start to get dropped. So far everything as expected.

Stats:
{
   "center_queued_processed": 72031,
   "center_received_processed": 36016,
   "destination_d_forwarder_processed": 36015,
   "destination_d_local_processed": 36016,
   "dst_syslog_d_forwarder_0_tls_heavy-forwarder_shared-services_svc_cluster_local_6514_dropped": 4173,
   "dst_syslog_d_forwarder_0_tls_heavy-forwarder_shared-services_svc_cluster_local_6514_processed": 36015,
   "dst_syslog_d_forwarder_0_tls_heavy-forwarder_shared-services_svc_cluster_local_6514_queued": 31842,
   "dst_syslog_d_forwarder_0_tls_heavy-forwarder_shared-services_svc_cluster_local_6514_written": 0,
   ...
   "source_s_external_tls_processed": 36015
   ...
}

/var/log:
I have no name!@syslog-ng-76f898f5bb-sh9q8:/var/log$ ls -lh
total 213M
-rw------- 1 10001 10001  38K Mar 22 08:33 syslog-ng-00000.rqf
-rw------- 1 10001 10001 4.0K Mar 22 09:25 syslog-ng-00001.rqf
-rw------- 1 10001 10001 101M Mar 22 09:29 syslog-ng-00002.rqf
-rw------- 1 10001 10001 4.0K Mar 22 09:46 syslog-ng-00003.rqf
-rw------- 1 10001 10001 4.0K Mar 22 10:14 syslog-ng-00004.rqf
...

Now, if I kill the syslog-ng Pod or gracefully scale the deployment to 0 and back up to 1, the queue is still lost. The stats all go back to 0 and bringing up the destination shows no (queued) messages coming in.  On every restart a new .rqf gets created. New messages get spooled to the latest .rqf file until that one reaches the configured 100Mb size limit as well.

What am I missing here?

Thanks in advance!
Ralf