[syslog-ng] how to archive logs efficiently

Daniel Neubacher daniel.neubacher at xing.com
Mon Feb 18 16:04:24 CET 2013


Thanks for your detailed responses. The external compression was my last resolution but now I've maybe found my final solution. I've installed Kernel 3.2 & Btrfs with zlib compression. The fs isn't marked as stable yet but I've heard some positive responses from fellow admins. Right now I've got a real time compression rate of 84% without any cpu problems. I hope my long term test is running fine but right now I'm pretty happy.

-----Ursprüngliche Nachricht-----
Von: syslog-ng-bounces at lists.balabit.hu [mailto:syslog-ng-bounces at lists.balabit.hu] Im Auftrag von Gergely Nagy
Gesendet: Dienstag, 12. Februar 2013 15:10
An: Syslog-ng users' and developers' mailing list
Betreff: Re: [syslog-ng] how to archive logs efficiently

Daniel Neubacher <daniel.neubacher at xing.com> writes:

> my syslog-ng has gotten quite big with 50k logs per second and the 
> server seems to hit the io limit at night. While a few month ago I 
> could run a gzip with ionice over all old logs the server doesn't like 
> it anymore and quite a lot of logs are storing while the compression 
> lasts.

> I'm using the ose so I've got no logstore. And for a second I've 
> thought about writing the logs a compressed fuse fs but... fuse :P So 
> how are you guys doing it?

I've used several different approaches over the years, I'll list some of them, with pros and cons:

Rotate & compress
=================

The first approach I used was to simply rotate log files and compress them. This quickly killed my CPU and disks.

Pros:
 - Simple as a brick.

Cons:
 - CPU and IO intensive, bogs down the computer

Runtime, external compression
=============================

Another option I played with was to write a very small program, that accepts data on stdin, and compresses it on the fly, then I sent my logs to this destination. I also kept the most recent logs in uncompressed files too.

Pros:
 - Fairly simple
 - The CPU/IO load is better spread
 - Uncompressed logs still available

   I used /var/log/FILENAME-${YEAR}${MONTH}${DAY}.log, and simply
   deleted old ones.
 - You don't need to re-read old logs to archive them, archival happens  on the fly.

Cons:
 - Requires an external program, which one will have to carefully write  to not loose data.
 - It's much harder to reliably rotate the compressed files.

   My program closed the current file on SIGHUP, and opened a new
   one. Not too elegant, and not really configurable, but got the job
   done.
 - Still bogs down the CPU and IO.

   This can be partially addressed by writing the compressed files to a
   different disk than where you write uncompressed logs.

Runtime archival to external services
=====================================

Since I didn't have the resources to put anymore disks into my log server at the time, IO became a problem. So I moved the archival to a different server, by sending uncompressed logs over the network, and moving the runtime compression to the other box.

This is pretty much the same solution as the one above, but instead of a local pipe, stuff is sent over the network.

Pros:
 - Still simple
 - IO is done on another box, so doesn't disturb the local uncompressed  log storage.

Cons:
 - Needs a separate server
 - Increased network bandwidth
 - If archiving is slow, it can still bog down both machines due to flow  control.
 - Needs potentially large queues on the sending side, and without disk  queue, that's not the most reliable thing.

Database
========

This is my current solution.

I still have my local logs in files for easy access, but the archive is stored in a MongoDB cluster.

Pros:
 - IO is spread accross a number of machines
 - Does not bog down the central server, ever
 - Structured logs, better queryability

Cons:
 - Data is not compressed
 - Needs a higher amount of resources to work reliably and efficiently
 - More complicated to set up
 - MongoDB does have an overhead over simply emitting text & compressing  it.

A variant of this would be to use AMQP to transfer logs, then you can attach any number of archival servers onto the publisher, and spread out the work nicely. But AMQP adds its own overhead too.

Other solutions
===============

There's a whole lot of other ways to achieve the same thing, the above ones are only those few I've personally used in the not too distant past.

-- 
|8]

______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq



More information about the syslog-ng mailing list