[syslog-ng]Playing "catchup"...

Jay Guerette syslog-ng@lists.balabit.hu
Sun, 2 Jan 2005 00:50:08 -0500


Below is a template for a program I've used, in about 20 different
variations, for the past 4 years. I recycle the same code, and just
change the parsing function.

- It runs as a daemon.
- It follows the specified log file, and presents each new line to the
log_parse function.
- It writes an offset in a conf file after each successful line parsed.
- Whenever it starts, it will start exactly where it left off, and
catch up if necessary.
- It follows log files when they rotate (sort of: it assumes the log
file is named '/var/log/foo.$YEAR.$MONTH.$DAY') .
- You can debug/test your parsing by running it with the arg '-nodaemon'.
- If you run it, it will kill any already running instances.
- Yeah sure, there should be more error checking, and this or that
should be written differently, but this WORKS and has never failed me
in 4 years. If I ever have enough free time to add error checking or
rewrite this or that, it probably means I've died.

I have 8 of these running on 7 different log files on a busy
production system, currently handling about 13M lines a day.

Somewhere I have a fully commented version lying around. :-P  If you
have trouble untangling it, let me know and I'll help you out.

<code>
#! /usr/bin/perl

$log = '/var/log/foo';

$args = join(' ',@ARGV);
($name) = ($0=~/([\s\w\-_]+)$/); $0 = $name;

if ($args!~/-nodaemon/i)  { &daemonize; }
else { select(STDOUT); $|=1; }

if (open(PID,"<$pid")) {
	kill(15, $lastpid) if (defined($lastpid = <PID>));
	close(PID);
}
open PID,">$pid"; print PID "$$\n"; close PID;

foreach (`ps axo pid,cmd`) {
	next unless (/$name/i);
	($pid)=(/(\d+)\s+$name/i);
	next if ($pid == $$);
	print "TERM $name [$pid]\n";
	kill('TERM',$pid); sleep(2);
	if (kill(0,$pid)) {
		print "KILL $name [$pid]\n";
		kill('KILL',$pid); sleep(2);
	}
	die "couldn't KILL existing process!\n" if kill(0,$pid);
}

$conf = "/var/run/$name.conf";
$pid = "/var/run/$name.pid";
($hup,$term,$maxread,$maxbytes,$byteoffset)=(0,0,15,0,0);

$SIG{HUP}=sub{ $hup=1; };
$SIG{INT}=sub{ $term=1; };
$SIG{TERM}=sub{ $term=1; };
setpriority 'PRIO_PROCESS',$$,-5;

while (!$hup && !$term) {

	if (@log_buffer = check_log()) {
		foreach $line (@log_buffer) {
			$maxread = $length if (($length = 1 + length($line)) > $maxread);
			log_parse($line);
			open CONF, ">$conf"; print CONF "$maxread $maxbytes ".($byteoffset
+= $length); close CONF;
			sysseek(LOG, $byteoffset, 0);
		}
		@log_buffer = ();
	}

	select(undef, undef, undef, 0.05);
}

if ($term || ($args=~/-nodaemon/i)) {
	exit;
}
defined(my $parent=fork) or die;
exit if ($parent);
exec($name);
exit;

sub daemonize {
	chdir '/' or die;
	open STDIN,'</dev/null' or die "Couldn't break fron STDIN?\nAborting.\n";
	open STDOUT,'>/dev/null' or die "Couldn't break from STDOUT?\nAborting.\n";
	defined(my $pid=fork) or die "Couldn't fork!\nAborting.\n";
	exit if $pid;
	use POSIX 'setsid';
	POSIX::setsid or die "Couldn't set SID?\nAborting.\n";
	open STDERR,'>/dev/null' or die "Couldn't break from STDERR?\nAborting.\n";
}

sub log_open {
	($mday,$mon,$year) = (localtime)[3..5];
	$current = sprintf("%d.%02d.%02d", 1900+$year, 1+$mon, 0+$mday);
	sysopen(LOG, "$log.$current", 0);
	unlink($conf) if ($args=~/-init/i);
	if (-e $conf) {
		open CONF,"<$conf"; ($maxread,$maxbytes,$byteoffset)=split('
',<CONF>); close CONF;
		$byteoffset = $byteeof if ($byteoffset > ($byteeof = 0 + sysseek(LOG, 0, 2)));
		$byteoffset = 0 + sysseek(LOG, $byteoffset, 0);
	}
	else {
		$byteoffset = 0 + sysseek(LOG, 0, 2);
	}
	open CONF,">$conf"; print CONF "$maxread $maxbytes $byteoffset"; close CONF;
	print "opened: $log.$current\n";
}

sub log_close {
	utime(undef, undef, $conf);
	close LOG;
	print "closed: $log.$current\n";
}

sub check_log {
	if ($log_wait) {
		return if (time() < $log_wait);
		undef($log_wait);
		log_open();
		return;
	}

	if ($stat_log) {
		return if (time() < $stat_log);
		undef($stat_log);
		($mday,$mon,$year) = (localtime)[3..5];
		my $stamp = sprintf("%d.%02d.%02d", 1900+$year, 1+$mon, 0+$mday);
		if ($stamp ne $current || (stat("$log.$current"))[7] < $byteoffset) {
			log_close();
			$log_wait = time() + 1;
			return;
		}
	}
	
	$bytes = sysread(LOG, $buffer, $maxread);
	if (!defined($bytes)) {
		$log_wait = time() + 1;
		return;
	}
	unless ($bytes) {
		$stat_log = time() + 5;
		return;
	}
	
	while ($bytesread = sysread LOG, $buffer, $maxread, $bytes) {
		last if (($bytes += $bytesread) > $maxbytes && $maxbytes > 0);
	}
	$buffer = substr($buffer, 0, $last) if (($last = rindex($buffer, "\n")) >= 0);

	foreach $part (split('\n', $buffer)) {
		push(@log_buffer, $part);
	}

	return(@log_buffer);
}

sub log_parse {
	my $line = shift;
	$line =~ s/[\r\n\s]+/ /g;
	return if ($line =~ /^\s+$/);
	print "$line\n";

	# do something here
}
</code>

On Sat, 01 Jan 2005 18:32:01 -0800, Ed Walker <ewalker@surfcity.net> wrote:
> In the event that SQL injection dies on the central loghost, we've thought
> of keeping a copy in a file as well.
> 
> And if, for some reason, syslog-ng on the central server dies, or
> connectivity is lost, we're already keeping a copy of the logging data on
> each remote server.
> 
> So, what's the best way to play "catch up" when something dies, and making
> it as easy as possible to import the missed data, without accidentally
> introducing duplication?