[syslog-ng] Buffering AF_UNIX Destination, Batch Post Processing Messages
Matthew Hall
mhall at mhcomputing.net
Wed Sep 8 21:17:51 CEST 2010
Hi Martin,
On Wed, Sep 08, 2010 at 01:48:08PM -0500, Martin Holste wrote:
> I will share my experience thus far with the exact
> problem you're tackling and what's been working for us
Thanks. I appreciate your willingness to jump in and discuss tricky
problems.
> Use the program() destination and open(FH, "-|") in Perl to read it.
> This saves the UDP packet creation overhead as well as ensures that
> there are no lost logs.
Good to know. If I use this method, how should I see when I have
collected one of my 60 second batches?
> I have experimented with having N number of preforked Perl child
> workers which all listen on "sub" pipes in a round-robin (modulo on
> Perl's $. variable), but I quickly found what you've already pointed
> out, that this is a sync pipe, so there's no sense in round-robin-ing
> since the parent can't move on to the next child pipe until the first
> child is done reading anyway.
That method would not work for me anyway because I need all of my
messages in a single memory space so I can crunch them down to look for
anomalies. If they ended up littered into a bunch of child processes
that would not get me very far.
> That's fine, since I have never found the Syslog-NG -> Perl end of
> things to be a bottleneck. In our setup, I have Perl do some simple
> massaging of the logs and then write out to a tab-separated file in
> one minute batches.
Good to know where the bottlenecks aren't! :) Note that in my case I am
only concerned about making sure I don't bog down the syslog-ng daemon
with slowness of my Perl code. If my stuff chokes the daemon that's a
disaster. It is OK if I am forced to lose some things sometimes going
to the Perl end.
> I then load the file in using MySQL LOAD DATA INFILE, and this can
> get you 100k mps sustained into a database if you're light on the
> indexing. There's also no reason you couldn't simply write the logs
> from Perl to flat file in sqlite format, which would allow you to
> skip the MySQL step entirely. It really depends what you want the
> final format of the logs to be in.
I have two cases I am trying to solve.
1) Crunch on the logs in 60 second batches to look for anomalies.
For this case I will need:
* all messages available in the memory of a single Perl
process / thread / etc. to perform the computations
-and either-
* some way of either being able to pull in more messages from the next
batch while processing the last batch (in Java I used two threads and
this worked fine for a past project)
-or-
* some way of batching messages coming in, and knowing when a batch is
done, so i can spend the next ~55 seconds doing processing, before
preparing again to receive a new batch-- so far I don't have a
scientific way of knowing I've gotten the entire batch from the daemon
2) I want to write logs to the DB. For this I am hoping to use the
native daemon support if possible, but if not I will do it from Perl.
If I will do it from Perl I will still want batching so I can do a bulk
write and bulk commit via LOAD DATA INFILE or another high speed
technique such as Oracle bulk load, etc.
> In any case, I would discourage you from trying the async framework
> route as it adds way too much overhead.
Agreed. I looked at Moose, AnyEvent, POE, etc. and concluded they were
too complicated and would not provide much benefit over simple select
for my case.
> If you do in fact find a bottleneck with pipes, I would think that a
> solution involving UDP sent to a local port could work with some fancy
> iptables load balancing. You would be limited to netstat counters to
> detect losses, but it would probably work. But unless you hit a pipe
> bottleneck, I think all of that is more trouble than it is worth.
Not going to help much in my case because I don't have a way of
crunching logs to find anomalies if they end up in fragmented memory of
different processes.
> --Martin
Matthew.
> On Wed, Sep 8, 2010 at 12:02 AM, <syslogng at feystorm.net> wrote:
> >
> >
> > Sent: Martedì 7 Settembre 2010 19.42.52
> > From: Matthew Hall <mhall at mhcomputing.net>
> > To: Syslog-ng users' and developers' mailing list
> > <syslog-ng at lists.balabit.hu>
> > Subject: Re: [syslog-ng] Buffering AF_UNIX Destination, Batch Post
> > Processing Messages
> >
> > Syslog-ng will queue all the destination messages until the oldest
> > message is 60 seconds old, and then flushes them all out at once.
> >
> >
> > This part is tricky. How do I tell if I have received all the messages?
> > How do I know when I have hit the end of the batch? Is it possible to
> > have the daemon insert a marker message, or is there some other way I
> > can check for this?
> >
> >
> > I do not believe there is an elegant way. Best idea I can come up with is to
> > put a timeout on the receiving end so that when it goes quiet for more than
> > X seconds or whatnot, it sees that as end of batch.
> > You might be able to request that the mark option be allowed for non-local
> > destinations. Basically that would allow you to set a mark of 1 second, and
> > when you receive 2 mark messages back-to-back, that would be end-of-batch
> > (would basically mean there was no data in between).
> >
> > Thanks,
> > Matthew.
More information about the syslog-ng
mailing list