[syslog-ng] Logs out of control

John Hala john.hala at villanova.edu
Wed Oct 10 19:22:04 CEST 2007


I have 170 hosts right now and plan on going up to 250.  Currently getting 1.5 GB of logs per day.  I think I should try to partition by day.  Using php-syslog-ng, I believe most searches will either be by date or by host, or  a combination of both.

What type of partitioning do you recommend?  I was thinking HASH partitioning.  With HASH, I won't have to specify the range or exact days to partition by, whereas with HASH, I just specify the column date.  In the table for the date column, I went with 'datetime'.  I don't quite understand how to partition that out.  Here's my newbie idea of how it should be done, however, I don't quite understand how to implement... Meaning, is it possible to partition a table that already has data in it, and if so, how to specify that particular table to partition because in the examples it just shows creating a table and following it by the partition code...

ie

| host     | varchar(128)        | YES  | MUL | NULL    |                |
| facility | varchar(10)         | YES  | MUL | NULL    |                |
| priority | varchar(10)         | YES  | MUL | NULL    |                |
| level    | varchar(10)         | YES  |     | NULL    |                |
| tag      | varchar(10)         | YES  |     | NULL    |                |
| datetime | datetime            | YES  | MUL | NULL    |                |
| program  | varchar(15)         | YES  | MUL | NULL    |                |
| msg      | text                | YES  | MUL | NULL    |                |
| seq      | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment


| logs  | CREATE TABLE `logs` (
  `host` varchar(128) DEFAULT NULL,
  `facility` varchar(10) DEFAULT NULL,
  `priority` varchar(10) DEFAULT NULL,
  `level` varchar(10) DEFAULT NULL,
  `tag` varchar(10) DEFAULT NULL,
  `datetime` datetime DEFAULT NULL,
  `program` varchar(15) DEFAULT NULL,
  `msg` text,
  `seq` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`seq`),
  KEY `host` (`host`),
  KEY `program` (`program`),
  KEY `datetime` (`datetime`),
  KEY `priority` (`priority`),
  KEY `facility` (`facility`),
  FULLTEXT KEY `msg` (`msg`)
) ENGINE=MyISAM AUTO_INCREMENT=44728025 DEFAULT CHARSET=latin1 MAX_ROWS=4294967295 AVG_ROW_LENGTH=50

so the partition code would go like...

PARTITION BY HASH(DAY(timedate))
PARTITIONS 365;
PARTITION BY HASH(HOST(host))
PARTITIONS 500;     --> (I am just giving a large number that I don't think the number of hosts will reach)

I'm pretty certain, I'm way off, lol.

Any help would be greatly appreciated!

Thanks,
John

-----Original Message-----
From: syslog-ng-bounces at lists.balabit.hu [mailto:syslog-ng-bounces at lists.balabit.hu] On Behalf Of Valdis.Kletnieks at vt.edu
Sent: Saturday, October 06, 2007 12:27 AM
To: Syslog-ng users' and developers' mailing list
Subject: Re: [syslog-ng] Logs out of control

On Fri, 05 Oct 2007 15:33:30 EDT, John Hala said:

> how would you cut up the partition...  by day?

This would of course depend on your local requirements - but units such as "by day", "by week", "by month" certainly come to mind.  You will also want to consider how easy it is to implement local policy/legal requirements such as "logs of XYZ events *must* be kept 180 days", "logs of ABC *must* be discarded after 90 days", and so on.  Also add in things like "how many records per day", and "how many machines", and so on.

You may want to consider using slightly larger time units, such as "month"
or "last 30 days", to simplify your life when you're asked to produce log entries for "ABC for the last week", or consider some other partition such as "per machine per month" if that makes sense in your network.

If it helps any, I also manage a server that tracks IDS incidents, and the useful quantities there are "last 2 hours", "last 24 hours", "this month", and "forever".


More information about the syslog-ng mailing list