[syslog-ng] how to archive logs efficiently

Gergely Nagy algernon at balabit.hu
Tue Feb 12 15:10:18 CET 2013


Daniel Neubacher <daniel.neubacher at xing.com> writes:

> my syslog-ng has gotten quite big with 50k logs per second and the
> server seems to hit the io limit at night. While a few month ago I
> could run a gzip with ionice over all old logs the server doesn't like
> it anymore and quite a lot of logs are storing while the compression
> lasts.

> I'm using the ose so I've got no logstore. And for a second I've
> thought about writing the logs a compressed fuse fs but... fuse :P So
> how are you guys doing it?

I've used several different approaches over the years, I'll list some of
them, with pros and cons:

Rotate & compress
=================

The first approach I used was to simply rotate log files and compress
them. This quickly killed my CPU and disks.

Pros:
 - Simple as a brick.

Cons:
 - CPU and IO intensive, bogs down the computer

Runtime, external compression
=============================

Another option I played with was to write a very small program, that
accepts data on stdin, and compresses it on the fly, then I sent my logs
to this destination. I also kept the most recent logs in uncompressed
files too.

Pros:
 - Fairly simple
 - The CPU/IO load is better spread
 - Uncompressed logs still available

   I used /var/log/FILENAME-${YEAR}${MONTH}${DAY}.log, and simply
   deleted old ones.
 - You don't need to re-read old logs to archive them, archival happens
 on the fly.

Cons:
 - Requires an external program, which one will have to carefully write
 to not loose data.
 - It's much harder to reliably rotate the compressed files.

   My program closed the current file on SIGHUP, and opened a new
   one. Not too elegant, and not really configurable, but got the job
   done.
 - Still bogs down the CPU and IO.

   This can be partially addressed by writing the compressed files to a
   different disk than where you write uncompressed logs.

Runtime archival to external services
=====================================

Since I didn't have the resources to put anymore disks into my log
server at the time, IO became a problem. So I moved the archival to a
different server, by sending uncompressed logs over the network, and
moving the runtime compression to the other box.

This is pretty much the same solution as the one above, but instead of a
local pipe, stuff is sent over the network.

Pros:
 - Still simple
 - IO is done on another box, so doesn't disturb the local uncompressed
 log storage.

Cons:
 - Needs a separate server
 - Increased network bandwidth
 - If archiving is slow, it can still bog down both machines due to flow
 control.
 - Needs potentially large queues on the sending side, and without disk
 queue, that's not the most reliable thing.

Database
========

This is my current solution.

I still have my local logs in files for easy access, but the archive is
stored in a MongoDB cluster.

Pros:
 - IO is spread accross a number of machines
 - Does not bog down the central server, ever
 - Structured logs, better queryability

Cons:
 - Data is not compressed
 - Needs a higher amount of resources to work reliably and efficiently
 - More complicated to set up
 - MongoDB does have an overhead over simply emitting text & compressing
 it.

A variant of this would be to use AMQP to transfer logs, then you can
attach any number of archival servers onto the publisher, and spread out
the work nicely. But AMQP adds its own overhead too.

Other solutions
===============

There's a whole lot of other ways to achieve the same thing, the above
ones are only those few I've personally used in the not too distant
past.

-- 
|8]



More information about the syslog-ng mailing list