[syslog-ng] Buffering AF_UNIX Destination, Batch Post Processing Messages

Wed Sep 8 21:17:51 CEST 2010

Hi Martin,

On Wed, Sep 08, 2010 at 01:48:08PM -0500, Martin Holste wrote:
> I will share my experience thus far with the exact
> problem you're tackling and what's been working for us

Thanks. I appreciate your willingness to jump in and discuss tricky 
problems.

> Use the program() destination and open(FH, "-|") in Perl to read it. 
> This saves the UDP packet creation overhead as well as ensures that 
> there are no lost logs.

Good to know. If I use this method, how should I see when I have 
collected one of my 60 second batches?

> I have experimented with having N number of preforked Perl child 
> workers which all listen on "sub" pipes in a round-robin (modulo on 
> Perl's $. variable), but I quickly found what you've already pointed 
> out, that this is a sync pipe, so there's no sense in round-robin-ing 
> since the parent can't move on to the next child pipe until the first 
> child is done reading anyway.

That method would not work for me anyway because I need all of my 
messages in a single memory space so I can crunch them down to look for 
anomalies. If they ended up littered into a bunch of child processes 
that would not get me very far.

> That's fine, since I have never found the Syslog-NG -> Perl end of 
> things to be a bottleneck. In our setup, I have Perl do some simple 
> massaging of the logs and then write out to a tab-separated file in 
> one minute batches.

Good to know where the bottlenecks aren't! :) Note that in my case I am 
only concerned about making sure I don't bog down the syslog-ng daemon 
with slowness of my Perl code. If my stuff chokes the daemon that's a 
disaster. It is OK if I am forced to lose some things sometimes going 
to the Perl end.

> I then load the file in using MySQL LOAD DATA INFILE, and this can 
> get you 100k mps sustained into a database if you're light on the 
> indexing. There's also no reason you couldn't simply write the logs 
> from Perl to flat file in sqlite format, which would allow you to 
> skip the MySQL step entirely. It really depends what you want the 
> final format of the logs to be in.

I have two cases I am trying to solve.

1) Crunch on the logs in 60 second batches to look for anomalies.

For this case I will need:

* all messages available in the memory of a single Perl 
process / thread / etc. to perform the computations

-and either-

* some way of either being able to pull in more messages from the next 
batch while processing the last batch (in Java I used two threads and 
this worked fine for a past project)

-or-

* some way of batching messages coming in, and knowing when a batch is 
done, so i can spend the next ~55 seconds doing processing, before 
preparing again to receive a new batch-- so far I don't have a 
scientific way of knowing I've gotten the entire batch from the daemon

2) I want to write logs to the DB. For this I am hoping to use the 
native daemon support if possible, but if not I will do it from Perl.

If I will do it from Perl I will still want batching so I can do a bulk 
write and bulk commit via LOAD DATA INFILE or another high speed 
technique such as Oracle bulk load, etc.

> In any case, I would discourage you from trying the async framework
> route as it adds way too much overhead.

Agreed. I looked at Moose, AnyEvent, POE, etc. and concluded they were 
too complicated and would not provide much benefit over simple select 
for my case.

> If you do in fact find a bottleneck with pipes, I would think that a
> solution involving UDP sent to a local port could work with some fancy
> iptables load balancing. You would be limited to netstat counters to
> detect losses, but it would probably work. But unless you hit a pipe
> bottleneck, I think all of that is more trouble than it is worth.

Not going to help much in my case because I don't have a way of 
crunching logs to find anomalies if they end up in fragmented memory of 
different processes.

> --Martin

Matthew.

> On Wed, Sep 8, 2010 at 12:02 AM,  <syslogng at feystorm.net> wrote:
> >
> >
> > Sent: Martedì 7 Settembre 2010 19.42.52
> > From: Matthew Hall <mhall at mhcomputing.net>
> > To: Syslog-ng users' and developers' mailing list
> > <syslog-ng at lists.balabit.hu>
> > Subject: Re: [syslog-ng] Buffering AF_UNIX Destination, Batch Post
> > Processing Messages
> >
> > Syslog-ng will queue all the destination messages until the oldest
> > message is 60 seconds old, and then flushes them all out at once.
> >
> >
> > This part is tricky. How do I tell if I have received all the messages?
> > How do I know when I have hit the end of the batch? Is it possible to
> > have the daemon insert a marker message, or is there some other way I
> > can check for this?
> >
> >
> > I do not believe there is an elegant way. Best idea I can come up with is to
> > put a timeout on the receiving end so that when it goes quiet for more than
> > X seconds or whatnot, it sees that as end of batch.
> > You might be able to request that the mark option be allowed for non-local
> > destinations. Basically that would allow you to set a mark of 1 second, and
> > when you receive 2 mark messages back-to-back, that would be end-of-batch
> > (would basically mean there was no data in between).
> >
> > Thanks,
> > Matthew.