Clayton,<br><br>I am doing a very similar thing, and you should definitely not be seeing that kind of CPU utilization with so few messages. I see about 80% CPU utilization with my script receiving around 3500 MPS on old 32-bit hardware. What I will say is that you want to setup everything possible outside of your main processing loop and keep the regexp to a bare minimum within the loop. In my first attempt, the script setup all the DBI statement handles it would possible need first, then did a $sth->execute(@fields) foreach log message as it came in. This was able to process about 200-300 MPS, but the script spent so much time waiting for the DB, it wouldn't scale to anywhere near 3500 MPS. If you want it to be scalable past 200 MPS, I recommend doing what I'm doing now on the second version, which is to open regular filehandles before the main loop in order to create MySQL infile batches. I also am using db-parser and having my log templates tab separated so that I can do a split() to get the individual fields. The key is to let SyslogNG do the bulk of the parsing work so that when it spits a message out, you already know the class and rule_id that it matched, or maybe just the tags, which were just released as a feature. This saves doing almost all of your own regexp work, and thereby saves most of the CPU power associated. Then, when the files are written, use MySQL's much more efficient "LOAD DATA INFILE" syntax to do frequent bulk batch loads. With the "LOW PRIORITY" flag, they won't block client queries from executing. You would need to tune the $batch_limit in the below script or set a timeout to avoid lag at low utilization periods.<br>
<br>my $templates = { log_type_a => "%s\t%s\t%d\t%d ...." };<br>my $batch_limit = 10_000;<br>my $Run = 1;<br>my $batch_id = 0;<br>my $fifo;<br>open($fifo, "/path/to/fifo");<br>while ($Run){<br> my $batch_files = process_batch($fifo, $batch_id);<br>
mysql_load_data_infile($batch_files); # you would create this sub which executes a LOAD DATA INFILE for reach file<br> $batch_id++;<br>}<br><br>sub process_batch {<br> my ($fifo, $batch_id) = @_;<br> my $filehandles = {};<br>
foreach my $log_type (@log_types){<br> my $fh;<br> open($fh, ">", $log_type . "." . $batch_id);<br> $filehandles->{$log_type} = $fh;<br> }<br> my $counter = 0;<br> while (<FIFO>){<br>
chomp; <br> my @fields = split(/\t/, $_); # yields (timestamp, program, log_type, rule_id) etc. based on your SyslogNG template<br> printf $filehandles->{ $fields[2] } $templates->{ $fields->[2] }, @fields;<br>
$counter++;<br> last if $counter > $batch_limit;<br> }<br> foreach my $fh (keys %{$filehandles}){<br> close($fh);<br> }<br> return $filehandles;<br>}<br><br>To make this really go fast, I'm wrapping the whole thing in a POE event queue for async processing. That way the script will be receiving logs at the same time as writing them to MySQL via forked worker processes.<br>
<br>Feel free to shoot me your script and I'll take a look at it. I plan on eventually releasing my scripts when they work the way I want them to.<br><br>--Martin<br><br><div class="gmail_quote">On Sun, Jun 21, 2009 at 2:27 PM, Clayton Dukes <span dir="ltr"><<a href="mailto:cdukes@gmail.com">cdukes@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hiya Folks!<br>
I know this isn't necessarily the purview of this group but I thought<br>
I'd ask anyways since there are so many smart people here :-)<br>
<br>
I have syslog-ng feeding to a pipe which my perl script reads from,<br>
does some filtering/deduplication of messages, and then inserts into a<br>
mysql db.<br>
For some reason, the perl script is running between 85-100% cpu at all<br>
times (mysql cpu is ok).<br>
I'm receiving roughly 1-2 messages per second on my test server, but<br>
plan to use this for a production box that will receive much more<br>
(around 50 mps)<br>
<br>
Is there some perl magic I can do to lower the cpu utilization? caching, etc?<br>
I'm happy to share my script, but a large portion of it depends on<br>
variables set from within my program (php-syslog-ng) so it won't run<br>
on outside systems (unless, of course, you install my software :-))<br>
<br>
Thanks!<br>
<br>
<br>
--<br>
______________________________________________________________<br>
<font color="#888888"><br>
Clayton Dukes<br>
______________________________________________________________<br>
______________________________________________________________________________<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.campin.net/syslog-ng/faq.html" target="_blank">http://www.campin.net/syslog-ng/faq.html</a><br>
<br>
</font></blockquote></div><br>