Hi Martin, On Wed, Sep 08, 2010 at 01:48:08PM -0500, Martin Holste wrote:
I will share my experience thus far with the exact problem you're tackling and what's been working for us
Thanks. I appreciate your willingness to jump in and discuss tricky problems.
Use the program() destination and open(FH, "-|") in Perl to read it. This saves the UDP packet creation overhead as well as ensures that there are no lost logs.
Good to know. If I use this method, how should I see when I have collected one of my 60 second batches?
I have experimented with having N number of preforked Perl child workers which all listen on "sub" pipes in a round-robin (modulo on Perl's $. variable), but I quickly found what you've already pointed out, that this is a sync pipe, so there's no sense in round-robin-ing since the parent can't move on to the next child pipe until the first child is done reading anyway.
That method would not work for me anyway because I need all of my messages in a single memory space so I can crunch them down to look for anomalies. If they ended up littered into a bunch of child processes that would not get me very far.
That's fine, since I have never found the Syslog-NG -> Perl end of things to be a bottleneck. In our setup, I have Perl do some simple massaging of the logs and then write out to a tab-separated file in one minute batches.
Good to know where the bottlenecks aren't! :) Note that in my case I am only concerned about making sure I don't bog down the syslog-ng daemon with slowness of my Perl code. If my stuff chokes the daemon that's a disaster. It is OK if I am forced to lose some things sometimes going to the Perl end.
I then load the file in using MySQL LOAD DATA INFILE, and this can get you 100k mps sustained into a database if you're light on the indexing. There's also no reason you couldn't simply write the logs from Perl to flat file in sqlite format, which would allow you to skip the MySQL step entirely. It really depends what you want the final format of the logs to be in.
I have two cases I am trying to solve. 1) Crunch on the logs in 60 second batches to look for anomalies. For this case I will need: * all messages available in the memory of a single Perl process / thread / etc. to perform the computations -and either- * some way of either being able to pull in more messages from the next batch while processing the last batch (in Java I used two threads and this worked fine for a past project) -or- * some way of batching messages coming in, and knowing when a batch is done, so i can spend the next ~55 seconds doing processing, before preparing again to receive a new batch-- so far I don't have a scientific way of knowing I've gotten the entire batch from the daemon 2) I want to write logs to the DB. For this I am hoping to use the native daemon support if possible, but if not I will do it from Perl. If I will do it from Perl I will still want batching so I can do a bulk write and bulk commit via LOAD DATA INFILE or another high speed technique such as Oracle bulk load, etc.
In any case, I would discourage you from trying the async framework route as it adds way too much overhead.
Agreed. I looked at Moose, AnyEvent, POE, etc. and concluded they were too complicated and would not provide much benefit over simple select for my case.
If you do in fact find a bottleneck with pipes, I would think that a solution involving UDP sent to a local port could work with some fancy iptables load balancing. You would be limited to netstat counters to detect losses, but it would probably work. But unless you hit a pipe bottleneck, I think all of that is more trouble than it is worth.
Not going to help much in my case because I don't have a way of crunching logs to find anomalies if they end up in fragmented memory of different processes.
--Martin
Matthew.
On Wed, Sep 8, 2010 at 12:02 AM, <syslogng@feystorm.net> wrote:
Sent: Martedì 7 Settembre 2010 19.42.52 From: Matthew Hall <mhall@mhcomputing.net> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] Buffering AF_UNIX Destination, Batch Post Processing Messages
Syslog-ng will queue all the destination messages until the oldest message is 60 seconds old, and then flushes them all out at once.
This part is tricky. How do I tell if I have received all the messages? How do I know when I have hit the end of the batch? Is it possible to have the daemon insert a marker message, or is there some other way I can check for this?
I do not believe there is an elegant way. Best idea I can come up with is to put a timeout on the receiving end so that when it goes quiet for more than X seconds or whatnot, it sees that as end of batch. You might be able to request that the mark option be allowed for non-local destinations. Basically that would allow you to set a mark of 1 second, and when you receive 2 mark messages back-to-back, that would be end-of-batch (would basically mean there was no data in between).
Thanks, Matthew.