[syslog-ng]Reliable syslog and network outages

Fri, 15 Aug 2003 17:27:14 +0100

I'm trying to work out whether syslog-ng can deliver syslog messages
reliably in the face of client reboots during network partitions.

 From reading the documentation, it is not clear where log_fifo stores
its data (disk or memory) and therefore whether or not log entries on a
remote syslog client that queue up during a partition of the network
which seperates syslog client from loghost will remain queued if the
syslog client host reboots. If my approach is unworkable, pointers or
aid on how to solve the general problem another way would be greatly
appreciated.

Here are the details:

1. Several administered machines are located at diverse locations,
generally where they are used primarily/entirely by local users.

2. Local users can ordinarily continue to do useful work even if they
lose external network connectivity for a while; consequently HA network
connectivity is not a user requirement.

3. The machines do not generally require direct intervention very often,
so from an administration standpoint, 100% reachability is also not a
requirement.

4. As a result of points 2 and 3, highly reliable connectivity is not
available; outages of minutes or hours really do occur.

5. Pre-syslog-ng implementations of syslog using UDP will simply lose
all log entries during the time that a network is partitioned. This is
definitely not OK.

6. Syslog-ng appears to offer TCP-based delivery which certainly solves
one issue; normal network paket loss is not an issue.

7. Syslog-ng appears to maintain an internal FIFO which, if sized large
enough on a syslog relay host at the remote location(s), could collect
all log information generated during a partition and then successfully
deliver it when connectivity is restored.

8. If, however, syslog-ng's internal FIFO is memory-based then if the
remote location's syslog relay host is rebooted during the network
partition then all that it has accumulated and not yet transferred will
be lost. (Similarly if the death of the TCP connection is treated as
reason to give up, rather than continually retry.)

9. What information I have found on dealing with this appears to deal
with loghost unreliability rather than network unreliability, and thus
recommends providing HA through additional, redundant, loghosts. Setting
  aside the problems in reconcilliation of logs that this implies, it
takes away the reason for having syslog deliver via IP directly (rather
than, say, email) entirely.

Does syslog-ng provide a means to deal with this problem? Does some
other solution exist?

Thanks in advance.

- Raz