Below is a template for a program I've used, in about 20 different variations, for the past 4 years. I recycle the same code, and just change the parsing function. - It runs as a daemon. - It follows the specified log file, and presents each new line to the log_parse function. - It writes an offset in a conf file after each successful line parsed. - Whenever it starts, it will start exactly where it left off, and catch up if necessary. - It follows log files when they rotate (sort of: it assumes the log file is named '/var/log/foo.$YEAR.$MONTH.$DAY') . - You can debug/test your parsing by running it with the arg '-nodaemon'. - If you run it, it will kill any already running instances. - Yeah sure, there should be more error checking, and this or that should be written differently, but this WORKS and has never failed me in 4 years. If I ever have enough free time to add error checking or rewrite this or that, it probably means I've died. I have 8 of these running on 7 different log files on a busy production system, currently handling about 13M lines a day. Somewhere I have a fully commented version lying around. :-P If you have trouble untangling it, let me know and I'll help you out. <code> #! /usr/bin/perl $log = '/var/log/foo'; $args = join(' ',@ARGV); ($name) = ($0=~/([\s\w\-_]+)$/); $0 = $name; if ($args!~/-nodaemon/i) { &daemonize; } else { select(STDOUT); $|=1; } if (open(PID,"<$pid")) { kill(15, $lastpid) if (defined($lastpid = <PID>)); close(PID); } open PID,">$pid"; print PID "$$\n"; close PID; foreach (`ps axo pid,cmd`) { next unless (/$name/i); ($pid)=(/(\d+)\s+$name/i); next if ($pid == $$); print "TERM $name [$pid]\n"; kill('TERM',$pid); sleep(2); if (kill(0,$pid)) { print "KILL $name [$pid]\n"; kill('KILL',$pid); sleep(2); } die "couldn't KILL existing process!\n" if kill(0,$pid); } $conf = "/var/run/$name.conf"; $pid = "/var/run/$name.pid"; ($hup,$term,$maxread,$maxbytes,$byteoffset)=(0,0,15,0,0); $SIG{HUP}=sub{ $hup=1; }; $SIG{INT}=sub{ $term=1; }; $SIG{TERM}=sub{ $term=1; }; setpriority 'PRIO_PROCESS',$$,-5; while (!$hup && !$term) { if (@log_buffer = check_log()) { foreach $line (@log_buffer) { $maxread = $length if (($length = 1 + length($line)) > $maxread); log_parse($line); open CONF, ">$conf"; print CONF "$maxread $maxbytes ".($byteoffset += $length); close CONF; sysseek(LOG, $byteoffset, 0); } @log_buffer = (); } select(undef, undef, undef, 0.05); } if ($term || ($args=~/-nodaemon/i)) { exit; } defined(my $parent=fork) or die; exit if ($parent); exec($name); exit; sub daemonize { chdir '/' or die; open STDIN,'</dev/null' or die "Couldn't break fron STDIN?\nAborting.\n"; open STDOUT,'>/dev/null' or die "Couldn't break from STDOUT?\nAborting.\n"; defined(my $pid=fork) or die "Couldn't fork!\nAborting.\n"; exit if $pid; use POSIX 'setsid'; POSIX::setsid or die "Couldn't set SID?\nAborting.\n"; open STDERR,'>/dev/null' or die "Couldn't break from STDERR?\nAborting.\n"; } sub log_open { ($mday,$mon,$year) = (localtime)[3..5]; $current = sprintf("%d.%02d.%02d", 1900+$year, 1+$mon, 0+$mday); sysopen(LOG, "$log.$current", 0); unlink($conf) if ($args=~/-init/i); if (-e $conf) { open CONF,"<$conf"; ($maxread,$maxbytes,$byteoffset)=split(' ',<CONF>); close CONF; $byteoffset = $byteeof if ($byteoffset > ($byteeof = 0 + sysseek(LOG, 0, 2))); $byteoffset = 0 + sysseek(LOG, $byteoffset, 0); } else { $byteoffset = 0 + sysseek(LOG, 0, 2); } open CONF,">$conf"; print CONF "$maxread $maxbytes $byteoffset"; close CONF; print "opened: $log.$current\n"; } sub log_close { utime(undef, undef, $conf); close LOG; print "closed: $log.$current\n"; } sub check_log { if ($log_wait) { return if (time() < $log_wait); undef($log_wait); log_open(); return; } if ($stat_log) { return if (time() < $stat_log); undef($stat_log); ($mday,$mon,$year) = (localtime)[3..5]; my $stamp = sprintf("%d.%02d.%02d", 1900+$year, 1+$mon, 0+$mday); if ($stamp ne $current || (stat("$log.$current"))[7] < $byteoffset) { log_close(); $log_wait = time() + 1; return; } } $bytes = sysread(LOG, $buffer, $maxread); if (!defined($bytes)) { $log_wait = time() + 1; return; } unless ($bytes) { $stat_log = time() + 5; return; } while ($bytesread = sysread LOG, $buffer, $maxread, $bytes) { last if (($bytes += $bytesread) > $maxbytes && $maxbytes > 0); } $buffer = substr($buffer, 0, $last) if (($last = rindex($buffer, "\n")) >= 0); foreach $part (split('\n', $buffer)) { push(@log_buffer, $part); } return(@log_buffer); } sub log_parse { my $line = shift; $line =~ s/[\r\n\s]+/ /g; return if ($line =~ /^\s+$/); print "$line\n"; # do something here } </code> On Sat, 01 Jan 2005 18:32:01 -0800, Ed Walker <ewalker@surfcity.net> wrote:
In the event that SQL injection dies on the central loghost, we've thought of keeping a copy in a file as well.
And if, for some reason, syslog-ng on the central server dies, or connectivity is lost, we're already keeping a copy of the logging data on each remote server.
So, what's the best way to play "catch up" when something dies, and making it as easy as possible to import the missed data, without accidentally introducing duplication?