[syslog-ng] Perl FIFO eating CPU

Martin Holste mcholste at gmail.com
Sun Jun 21 23:00:48 CEST 2009


Clayton,

I am doing a very similar thing, and you should definitely not be seeing
that kind of CPU utilization with so few messages.  I see about 80% CPU
utilization with my script receiving around 3500 MPS on old 32-bit
hardware.  What I will say is that you want to setup everything possible
outside of your main processing loop and keep the regexp to a bare minimum
within the loop.  In my first attempt, the script setup all the DBI
statement handles it would possible need first, then did a
$sth->execute(@fields) foreach log message as it came in.  This was able to
process about 200-300 MPS, but the script spent so much time waiting for the
DB, it wouldn't scale to anywhere near 3500 MPS.  If you want it to be
scalable past 200 MPS, I recommend doing what I'm doing now on the second
version, which is to open regular filehandles before the main loop in order
to create MySQL infile batches.  I also am using db-parser and having my log
templates tab separated so that I can do a split() to get the individual
fields.  The key is to let SyslogNG do the bulk of the parsing work so that
when it spits a message out, you already know the class and rule_id that it
matched, or maybe just the tags, which were just released as a feature.
This saves doing almost all of your own regexp work, and thereby saves most
of the CPU power associated.  Then, when the files are written, use MySQL's
much more efficient "LOAD DATA INFILE" syntax to do frequent bulk batch
loads.  With the "LOW PRIORITY" flag, they won't block client queries from
executing.  You would need to tune the $batch_limit in the below script or
set a timeout to avoid lag at low utilization periods.

my $templates = { log_type_a => "%s\t%s\t%d\t%d ...." };
my $batch_limit = 10_000;
my $Run = 1;
my $batch_id = 0;
my $fifo;
open($fifo, "/path/to/fifo");
while ($Run){
  my $batch_files = process_batch($fifo, $batch_id);
  mysql_load_data_infile($batch_files); # you would create this sub which
executes a LOAD DATA INFILE for reach file
  $batch_id++;
}

sub process_batch {
  my ($fifo, $batch_id) = @_;
  my $filehandles = {};
  foreach my $log_type (@log_types){
    my $fh;
    open($fh, ">", $log_type . "." . $batch_id);
    $filehandles->{$log_type} = $fh;
  }
  my $counter = 0;
  while (<FIFO>){
    chomp;
    my @fields = split(/\t/, $_); # yields (timestamp, program, log_type,
rule_id) etc. based on your SyslogNG template
    printf $filehandles->{ $fields[2] } $templates->{ $fields->[2] },
@fields;
    $counter++;
    last if $counter > $batch_limit;
  }
  foreach my $fh (keys %{$filehandles}){
    close($fh);
  }
  return $filehandles;
}

To make this really go fast, I'm wrapping the whole thing in a POE event
queue for async processing.  That way the script will be receiving logs at
the same time as writing them to MySQL via forked worker processes.

Feel free to shoot me your script and I'll take a look at it.  I plan on
eventually releasing my scripts when they work the way I want them to.

--Martin

On Sun, Jun 21, 2009 at 2:27 PM, Clayton Dukes <cdukes at gmail.com> wrote:

> Hiya Folks!
> I know this isn't necessarily the purview of this group but I thought
> I'd ask anyways since there are so many smart people here :-)
>
> I have syslog-ng feeding to a pipe which my perl script reads from,
> does some filtering/deduplication of messages, and then inserts into a
> mysql db.
> For some reason, the perl script is running between 85-100% cpu at all
> times (mysql cpu is ok).
> I'm receiving roughly 1-2 messages per second on my test server, but
> plan to use this for a production box that will receive much more
> (around 50 mps)
>
> Is there some perl magic I can do to lower the cpu utilization? caching,
> etc?
> I'm happy to share my script, but a large portion of it depends on
> variables set from within my program (php-syslog-ng) so it won't run
> on outside systems (unless, of course, you install my software :-))
>
> Thanks!
>
>
> --
> ______________________________________________________________
>
> Clayton Dukes
> ______________________________________________________________
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20090621/0713c383/attachment.htm 


More information about the syslog-ng mailing list