Buffering AF_UNIX Destination, Batch Post Processing Messages

Matthew Hall

8 Sep 2010 8 Sep '10

3:05 a.m.

Hello All, I want to configure an AF_UNIX SOCK_DGRAM syslog-ng destination which sends certain log messages to an external program for further processing and analysis. This program should batch up the messages into 60 second batches for processing. Currently I am running into an architectural challenge in how I should process the 60 second batch without slowing down the select which is collecting the messages from the destination. In the past when creating a similar kind of application in Java I handled this by creating a huge dynamic array to store objects creating from each incoming message, then passed the array reference to another background thread for processing, and began building a new array in the select thread. Currently I am trying to solve this same basic problem in Perl, which has poor threading support. I am investigating a few different options: * use threads anyway-- not recommended by more expert Perl devs I asked * prefork a process which listens to the AF_UNIX from syslog-ng, and writes to some kind of buffered non blocking pipe with a really big buffer-- not sure if such a pipe device actually exists, many pipes block * postfork a worker process which handles the 60 second batch-- problem here is that you want to have a whole lot of long term state data which is maintained between batches to help separate the needles from the haystacks, and you could get weird behavior on the duplicated FDs that are still being select()ed in the parent process which are copied into the child. * use some kind of message or job queue to copy things from a producer process to a consumer process-- gearman, theschwartz, beanstalk, rabbitmq, activemq, and poe::component::mq have been suggested-- this would probably cause a lot of context switching and unwanted buffer copies * see if there is something existing in syslog-ng that can help with this situation. can it somehow be convinced to buffer things internally for my process when my process is busy on a 60 second batch, or send in 60 second batch, etc. / whatever other clever people can dream up? Is this a problem other people have dealt with before? What did you do about this one? I want to get this right and avoid making a big mess or reinventing the wheel. Matthew.

Show replies by date

syslogng＠feystorm.net

8 Sep 8 Sep

3:25 a.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

Syslog-ng already has the exact functionality you are looking for (at least as far as I understand what youre wanting). Create a udp destination driver, set flush_timeout to 60000 (60 seconds), and flush_lines to 0 (the default). Syslog-ng will queue all the destination messages until the oldest message is 60 seconds old, and then flushes them all out at once. Sent: Martedì 7 Settembre 2010 19.05.26 From: Matthew Hall <mhall@mhcomputing.net> To: syslog-ng@lists.balabit.hu Subject: [syslog-ng] Buffering AF_UNIX Destination, Batch Post Processing Messages

...

Hello All,

I want to configure an AF_UNIX SOCK_DGRAM syslog-ng destination which sends certain log messages to an external program for further processing and analysis. This program should batch up the messages into 60 second batches for processing.

Currently I am running into an architectural challenge in how I should process the 60 second batch without slowing down the select which is collecting the messages from the destination.

In the past when creating a similar kind of application in Java I handled this by creating a huge dynamic array to store objects creating from each incoming message, then passed the array reference to another background thread for processing, and began building a new array in the select thread.

Currently I am trying to solve this same basic problem in Perl, which has poor threading support. I am investigating a few different options:

* use threads anyway-- not recommended by more expert Perl devs I asked

* prefork a process which listens to the AF_UNIX from syslog-ng, and writes to some kind of buffered non blocking pipe with a really big buffer-- not sure if such a pipe device actually exists, many pipes block

* postfork a worker process which handles the 60 second batch-- problem here is that you want to have a whole lot of long term state data which is maintained between batches to help separate the needles from the haystacks, and you could get weird behavior on the duplicated FDs that are still being select()ed in the parent process which are copied into the child.

* use some kind of message or job queue to copy things from a producer process to a consumer process-- gearman, theschwartz, beanstalk, rabbitmq, activemq, and poe::component::mq have been suggested-- this would probably cause a lot of context switching and unwanted buffer copies

* see if there is something existing in syslog-ng that can help with this situation. can it somehow be convinced to buffer things internally for my process when my process is busy on a 60 second batch, or send in 60 second batch, etc. / whatever other clever people can dream up?

Is this a problem other people have dealt with before? What did you do about this one? I want to get this right and avoid making a big mess or reinventing the wheel.

Matthew. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

Matthew Hall

3:42 a.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

On Tue, Sep 07, 2010 at 07:25:37PM -0600, syslogng@feystorm.net wrote:

...

Syslog-ng already has the exact functionality you are looking for (at least as far as I understand what youre wanting).

Excellent! Thanks for the advice. This is why I ask about things before implementing bad or unnecessary code. ;-)

...

Create a udp destination driver, set flush_timeout to 60000 (60 seconds), and flush_lines to 0 (the default).

Makes sense except for the caveat I'll ask about below.

...

Syslog-ng will queue all the destination messages until the oldest message is 60 seconds old, and then flushes them all out at once.

This part is tricky. How do I tell if I have received all the messages? How do I know when I have hit the end of the batch? Is it possible to have the daemon insert a marker message, or is there some other way I can check for this? Thanks, Matthew.

syslogng＠feystorm.net

7:02 a.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

Sent: Martedì 7 Settembre 2010 19.42.52 From: Matthew Hall <mhall@mhcomputing.net> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] Buffering AF_UNIX Destination, Batch Post Processing Messages

...

...
Syslog-ng will queue all the destination messages until the oldest message is 60 seconds old, and then flushes them all out at once.

This part is tricky. How do I tell if I have received all the messages? How do I know when I have hit the end of the batch? Is it possible to have the daemon insert a marker message, or is there some other way I can check for this?

I do not believe there is an elegant way. Best idea I can come up with is to put a timeout on the receiving end so that when it goes quiet for more than X seconds or whatnot, it sees that as end of batch. You might be able to request that the mark option be allowed for non-local destinations. Basically that would allow you to set a mark of 1 second, and when you receive 2 mark messages back-to-back, that would be end-of-batch (would basically mean there was no data in between).

...

Thanks, Matthew. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

Martin Holste

8:48 p.m.

Those are good suggestions. However, we have had some luck with a different method. I will share my experience thus far with the exact problem you're tackling and what's been working for us: Use the program() destination and open(FH, "-|") in Perl to read it. This saves the UDP packet creation overhead as well as ensures that there are no lost logs. I have experimented with having N number of preforked Perl child workers which all listen on "sub" pipes in a round-robin (modulo on Perl's $. variable), but I quickly found what you've already pointed out, that this is a sync pipe, so there's no sense in round-robin-ing since the parent can't move on to the next child pipe until the first child is done reading anyway. That's fine, since I have never found the Syslog-NG -> Perl end of things to be a bottleneck. In our setup, I have Perl do some simple massaging of the logs and then write out to a tab-separated file in one minute batches. I then load the file in using MySQL LOAD DATA INFILE, and this can get you 100k mps sustained into a database if you're light on the indexing. There's also no reason you couldn't simply write the logs from Perl to flat file in sqlite format, which would allow you to skip the MySQL step entirely. It really depends what you want the final format of the logs to be in. In any case, I would discourage you from trying the async framework route as it adds way too much overhead. If you do in fact find a bottleneck with pipes, I would think that a solution involving UDP sent to a local port could work with some fancy iptables load balancing. You would be limited to netstat counters to detect losses, but it would probably work. But unless you hit a pipe bottleneck, I think all of that is more trouble than it is worth. --Martin On Wed, Sep 8, 2010 at 12:02 AM, <syslogng@feystorm.net> wrote:

...

Sent: Martedì 7 Settembre 2010 19.42.52 From: Matthew Hall <mhall@mhcomputing.net> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] Buffering AF_UNIX Destination, Batch Post Processing Messages

Syslog-ng will queue all the destination messages until the oldest message is 60 seconds old, and then flushes them all out at once.

This part is tricky. How do I tell if I have received all the messages? How do I know when I have hit the end of the batch? Is it possible to have the daemon insert a marker message, or is there some other way I can check for this?

I do not believe there is an elegant way. Best idea I can come up with is to put a timeout on the receiving end so that when it goes quiet for more than X seconds or whatnot, it sees that as end of batch. You might be able to request that the mark option be allowed for non-local destinations. Basically that would allow you to set a mark of 1 second, and when you receive 2 mark messages back-to-back, that would be end-of-batch (would basically mean there was no data in between).

Thanks, Matthew. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

Matthew Hall

9:17 p.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

Hi Martin, On Wed, Sep 08, 2010 at 01:48:08PM -0500, Martin Holste wrote:

...

I will share my experience thus far with the exact problem you're tackling and what's been working for us

Thanks. I appreciate your willingness to jump in and discuss tricky problems.

...

Use the program() destination and open(FH, "-|") in Perl to read it. This saves the UDP packet creation overhead as well as ensures that there are no lost logs.

Good to know. If I use this method, how should I see when I have collected one of my 60 second batches?

...

I have experimented with having N number of preforked Perl child workers which all listen on "sub" pipes in a round-robin (modulo on Perl's $. variable), but I quickly found what you've already pointed out, that this is a sync pipe, so there's no sense in round-robin-ing since the parent can't move on to the next child pipe until the first child is done reading anyway.

That method would not work for me anyway because I need all of my messages in a single memory space so I can crunch them down to look for anomalies. If they ended up littered into a bunch of child processes that would not get me very far.

...

That's fine, since I have never found the Syslog-NG -> Perl end of things to be a bottleneck. In our setup, I have Perl do some simple massaging of the logs and then write out to a tab-separated file in one minute batches.

Good to know where the bottlenecks aren't! :) Note that in my case I am only concerned about making sure I don't bog down the syslog-ng daemon with slowness of my Perl code. If my stuff chokes the daemon that's a disaster. It is OK if I am forced to lose some things sometimes going to the Perl end.

...

I then load the file in using MySQL LOAD DATA INFILE, and this can get you 100k mps sustained into a database if you're light on the indexing. There's also no reason you couldn't simply write the logs from Perl to flat file in sqlite format, which would allow you to skip the MySQL step entirely. It really depends what you want the final format of the logs to be in.

I have two cases I am trying to solve. 1) Crunch on the logs in 60 second batches to look for anomalies. For this case I will need: * all messages available in the memory of a single Perl process / thread / etc. to perform the computations -and either- * some way of either being able to pull in more messages from the next batch while processing the last batch (in Java I used two threads and this worked fine for a past project) -or- * some way of batching messages coming in, and knowing when a batch is done, so i can spend the next ~55 seconds doing processing, before preparing again to receive a new batch-- so far I don't have a scientific way of knowing I've gotten the entire batch from the daemon 2) I want to write logs to the DB. For this I am hoping to use the native daemon support if possible, but if not I will do it from Perl. If I will do it from Perl I will still want batching so I can do a bulk write and bulk commit via LOAD DATA INFILE or another high speed technique such as Oracle bulk load, etc.

...

In any case, I would discourage you from trying the async framework route as it adds way too much overhead.

Agreed. I looked at Moose, AnyEvent, POE, etc. and concluded they were too complicated and would not provide much benefit over simple select for my case.

...

If you do in fact find a bottleneck with pipes, I would think that a solution involving UDP sent to a local port could work with some fancy iptables load balancing. You would be limited to netstat counters to detect losses, but it would probably work. But unless you hit a pipe bottleneck, I think all of that is more trouble than it is worth.

Not going to help much in my case because I don't have a way of crunching logs to find anomalies if they end up in fragmented memory of different processes.

...

--Martin

Matthew.

...

On Wed, Sep 8, 2010 at 12:02 AM, <syslogng@feystorm.net> wrote:

...
Sent: Martedì 7 Settembre 2010 19.42.52 From: Matthew Hall <mhall@mhcomputing.net> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] Buffering AF_UNIX Destination, Batch Post Processing Messages

Syslog-ng will queue all the destination messages until the oldest message is 60 seconds old, and then flushes them all out at once.

This part is tricky. How do I tell if I have received all the messages? How do I know when I have hit the end of the batch? Is it possible to have the daemon insert a marker message, or is there some other way I can check for this?

I do not believe there is an elegant way. Best idea I can come up with is to put a timeout on the receiving end so that when it goes quiet for more than X seconds or whatnot, it sees that as end of batch. You might be able to request that the mark option be allowed for non-local destinations. Basically that would allow you to set a mark of 1 second, and when you receive 2 mark messages back-to-back, that would be end-of-batch (would basically mean there was no data in between).

Thanks, Matthew.

Martin Holste

10:53 p.m.

...

I have two cases I am trying to solve.

1) Crunch on the logs in 60 second batches to look for anomalies.

For this case I will need:

* all messages available in the memory of a single Perl process / thread / etc. to perform the computations

This should be no problem for a 60 second batch. The technique was borne from my attempt to have N child worker processes. Instead of N, I just have on child process. This way, the Syslog-NG -> Perl parent pipe stays open all the time, and Perl just swaps in a new child process when the 60 second batch is up. Oh, and use the Perl built-in "alarm" command for that, as in: while (1){ #main daemon loop my $fh; my $pid = open( $fh, "|-" ); # fork and send to child's STDIN if ($pid){ #parent while (<>){ $fh->print($_); # send logs to child worker } } else { #child my $continue = 1; local $SIG{ALRM} = sub { $continue = 0; } alarm 60; while ($continue and <>){ #this reads from the parent $fh->print() #do your log processing } #done with 60 second batch here, fork the anomaly cruncher and exit } } You will have to tweak this to do exactly what you want, probably with a second fork, but that's a decent skeleton for how to chain processes together without using anything too fancy. Async frameworks like POE and AnyEvent are a good fit for the fork management. Incidentally, I'd be interested in seeing what you come up with for the guts of the anomaly crunching, if you're willing to share. --Martin

Matthew Hall

11:24 p.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

On Wed, Sep 08, 2010 at 03:53:37PM -0500, Martin Holste wrote:

...

This should be no problem for a 60 second batch. The technique was borne from my attempt to have N child worker processes. Instead of N, I just have on child process. This way, the Syslog-NG -> Perl parent pipe stays open all the time, and Perl just swaps in a new child process when the 60 second batch is up. Oh, and use the Perl built-in "alarm" command for that, as in

Thanks for the code example. I was planning to use some kind of alarm signal based technique and it helps to see how one should implement this. Forking off a new anomaly cruncher each time is problematic because I will be generating a lot of big data structures which need to be maintained across 12-24 hours in order to find the kind of anomalies I need to find. If I fork off new workers, then each worker forgets what has been stored in the structures previously. If I am forking off workers I'd like a way for them to see the data structures without duplicating log messages or data structures due to the multiple buffer copies, context switches, etc. this will cause. It was this exact part of the problem which motivated my mails. I have been unable to think of a good way to pull things out of a pipe, socket, etc. in 60 second batches in such a way that I could keep pulling into the new batch, while processing the last batch, without forgetting all of the statistics I had collected in between.

...

Incidentally, I'd be interested in seeing what you come up with for the guts of the anomaly crunching, if you're willing to share.

Personally I have an open source full disclosure approach to security and I do not believe in hiding anything beyond the industry standard 30 day warning window for vulnerabilities. However this tradition is unfortunately not shared by my employers who have been funding my work on anomaly detection. We could however discuss this topic privately and decide if there could be a way to share some of my publicly available knowledge in a way that benefits your curiosity and the community at large.

...

--Martin

Matthew.

Martin Holste

11:35 p.m.

...

It was this exact part of the problem which motivated my mails. I have been unable to think of a good way to pull things out of a pipe, socket, etc. in 60 second batches in such a way that I could keep pulling into the new batch, while processing the last batch, without forgetting all of the statistics I had collected in between.

Ah, ok, I think I see what you mean. Almost everything I do is forked/threaded, and I have just grown accustomed to doing all state maintenance through a database. So, what do you actually need to retain from batch to batch? I'm assuming it's not the entirety of the raw data, right? Otherwise you'd certainly have to write it to file or DB. So, if it's some basic numbers, can you just write the summaries to a DB? I've taken lately to creating simple tables with just an ID and a BLOB column and writing JSON-serialized blobs to synchronize my workers to get a kind of pseudo-noSQL. I use a DB because it takes care of all of the transactional locking for me.

Balazs Scheidler

12 Sep 12 Sep

6:02 p.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

On Wed, 2010-09-08 at 13:48 -0500, Martin Holste wrote:

...

Those are good suggestions. However, we have had some luck with a different method. I will share my experience thus far with the exact problem you're tackling and what's been working for us:

Use the program() destination and open(FH, "-|") in Perl to read it. This saves the UDP packet creation overhead as well as ensures that there are no lost logs. I have experimented with having N number of preforked Perl child workers which all listen on "sub" pipes in a round-robin (modulo on Perl's $. variable), but I quickly found what you've already pointed out, that this is a sync pipe, so there's no sense in round-robin-ing since the parent can't move on to the next child pipe until the first child is done reading anyway. That's fine, since I have never found the Syslog-NG -> Perl end of things to be a bottleneck. In our setup, I have Perl do some simple massaging of the logs and then write out to a tab-separated file in one minute batches.

I guess syslog-ng could also write tab separated data into files and can also do per-minute batches (by using the $MIN macro). Are there any other things the perl stuff does? -- Bazsi

Martin Holste

15 Sep 15 Sep

4:51 p.m.

...

I guess syslog-ng could also write tab separated data into files and can also do per-minute batches (by using the $MIN macro). Are there any other things the perl stuff does?

Yes, you certainly could get Syslog-NG to write TSV in minute batches without any other program's intervention. I pipe to Perl because I do some minor data alterations before writing to file. Specifically, I CRC the program name to generate a program ID so that I can store the program in a programs table and the program_id in the main logs table to conserve space and keep the database in as close to 3rd normal form as I can. I also do basic conversions like INET_ATON all IP addresses to store them in integer columns. In the future, I would put advanced correlation capabilities there (probably nothing like what Matthew is cooking up, though!) as well as any real-time cluster messaging I need to do.

Balazs Scheidler

20 Sep 20 Sep

5:14 p.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

On Wed, 2010-09-15 at 09:51 -0500, Martin Holste wrote:

...

...
I guess syslog-ng could also write tab separated data into files and can also do per-minute batches (by using the $MIN macro). Are there any other things the perl stuff does?

Yes, you certainly could get Syslog-NG to write TSV in minute batches without any other program's intervention. I pipe to Perl because I do some minor data alterations before writing to file. Specifically, I CRC the program name to generate a program ID so that I can store the program in a programs table and the program_id in the main logs table to conserve space and keep the database in as close to 3rd normal form as I can.

Perfect example for a template-function.

...

I also do basic conversions like INET_ATON all IP addresses to store them in integer columns.

Again, should be a great example for a template-function. It was such a good idea, that I've actually coded it here: commit 70e91556b6af8724334443347fd6488745405344 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Sep 20 17:12:27 2010 +0200 convertfuncs: new plugin to contain conversion template functions The plugin now only contains ipv4-to-int which converts an IPv4 address to a long integer. Usage: $(ipv4-to-int $SOURCEIP)

...

In the future, I would put advanced correlation capabilities there (probably nothing like what Matthew is cooking up, though!) as well as any real-time cluster messaging I need to do.

Expect a blog post on this topic, a simple correllation engine is now built into patterndb. I'm afraid the information about this topic is not very much, but anyway, here's the patch that implements it: commit 9d07e274bdf2ba00b0e697a13299140f4bf04ed3 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Sep 20 15:54:37 2010 +0200 db-parser: initial support for simple message correllation This feature is not yet complete, will probably leak memory, but if not leak will probably use a _lot_ of memory, but still makes it possible to use simple log event correllation for those who want it. -- Bazsi

Martin Holste

9:35 p.m.

...

commit 70e91556b6af8724334443347fd6488745405344 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Sep 20 17:12:27 2010 +0200

convertfuncs: new plugin to contain conversion template functions

The plugin now only contains ipv4-to-int which converts an IPv4 address to a long integer.

Usage:

$(ipv4-to-int $SOURCEIP)

Very cool stuff!

...

Expect a blog post on this topic, a simple correllation engine is now built into patterndb.

Hm, very interesting, I'll be taking a look. Regarding detecting the batches being complete: It seems a little inelegant to have a baby-sitter script that looks for an appropriately named file in a given directory and hoping it's the right buffer. It would be really nice if Syslog-NG could execute program() on a file that has just been written to for the last time.

Balazs Scheidler

27 Sep 27 Sep

4:24 p.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

On Mon, 2010-09-20 at 14:35 -0500, Martin Holste wrote:

...

...
commit 70e91556b6af8724334443347fd6488745405344 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Sep 20 17:12:27 2010 +0200

convertfuncs: new plugin to contain conversion template functions

The plugin now only contains ipv4-to-int which converts an IPv4 address to a long integer.

Usage:

$(ipv4-to-int $SOURCEIP)

Very cool stuff!

...
Expect a blog post on this topic, a simple correllation engine is now built into patterndb.

Hm, very interesting, I'll be taking a look.

Regarding detecting the batches being complete: It seems a little inelegant to have a baby-sitter script that looks for an appropriately named file in a given directory and hoping it's the right buffer. It would be really nice if Syslog-NG could execute program() on a file that has just been written to for the last time.

I was thinking about adding "events" to sources/destinations which could invoke 3rd party tools/scripts when something happens. Events could be time based, but other setup/teardown style stuff can come in handy. e.g. destination d_file { file("/var/log/messages.$HOUR" events(cron(min(5) hour(*) exec("/usr/local/bin/messages-file-finished")); }; Not sure about the syntax though. Also I want it to be able to run processes like tail -f: source s_follow { pipe("/var/run/syslog-ng/tail-pipe" events(startup(supervise("/usr/bin/tail -f /var/log/apache.log > /var/run/syslog-ng/tail-pipe")))); }; I know that syslog-ng is capable for tailing files, but the point is that there are sometimes complex log systems of various applications, and the only sane interface to them to run a process to tail its otherwise binary logfile. I want syslog-ng to manage these processes. -- Bazsi

Martin Holste

4:38 p.m.

That could definitely be helpful, but I think the big one I'd be looking for would be something more basic that fires for a log chain when the rollover occurs, such as: destination d_file { file("/var/log/messages.$MIN" events( on_rotate( exec("/usr/local/bin/messages-file-finished") ) ) ) }; I'm sure you could replicate this by using your example syntax and making sure that your time macro in the destination file name matches the pseudo-cron entry, but it seems like that might introduce some small issues with synchronization or race conditions. On Mon, Sep 27, 2010 at 9:24 AM, Balazs Scheidler <bazsi@balabit.hu> wrote:

...

On Mon, 2010-09-20 at 14:35 -0500, Martin Holste wrote:

...
...
commit 70e91556b6af8724334443347fd6488745405344 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Sep 20 17:12:27 2010 +0200

convertfuncs: new plugin to contain conversion template functions

The plugin now only contains ipv4-to-int which converts an IPv4 address to a long integer.

Usage:

$(ipv4-to-int $SOURCEIP)

Very cool stuff!

...
Expect a blog post on this topic, a simple correllation engine is now built into patterndb.

Hm, very interesting, I'll be taking a look.

Regarding detecting the batches being complete: It seems a little inelegant to have a baby-sitter script that looks for an appropriately named file in a given directory and hoping it's the right buffer. It would be really nice if Syslog-NG could execute program() on a file that has just been written to for the last time.

I was thinking about adding "events" to sources/destinations which could invoke 3rd party tools/scripts when something happens.

Events could be time based, but other setup/teardown style stuff can come in handy.

e.g.

destination d_file { file("/var/log/messages.$HOUR" events(cron(min(5) hour(*) exec("/usr/local/bin/messages-file-finished")); };

Not sure about the syntax though. Also I want it to be able to run processes like tail -f:

source s_follow { pipe("/var/run/syslog-ng/tail-pipe" events(startup(supervise("/usr/bin/tail -f /var/log/apache.log > /var/run/syslog-ng/tail-pipe")))); };

I know that syslog-ng is capable for tailing files, but the point is that there are sometimes complex log systems of various applications, and the only sane interface to them to run a process to tail its otherwise binary logfile. I want syslog-ng to manage these processes.

-- Bazsi

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

Balazs Scheidler

29 Sep 29 Sep

10:51 a.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

On Mon, 2010-09-27 at 09:38 -0500, Martin Holste wrote:

...

That could definitely be helpful, but I think the big one I'd be looking for would be something more basic that fires for a log chain when the rollover occurs, such as:

destination d_file { file("/var/log/messages.$MIN" events( on_rotate( exec("/usr/local/bin/messages-file-finished") ) ) ) };

I'm sure you could replicate this by using your example syntax and making sure that your time macro in the destination file name matches the pseudo-cron entry, but it seems like that might introduce some small issues with synchronization or race conditions.

Yes. The problem with this, that there's no such thing as rollover. :( syslog-ng keeps expanding the template string to find out which files to write and then times out files that do not get written to after time_reap() seconds. The issue is that files may get closed even in the middle of the minute. So the only sane way to react to "rollovers" is based on time. ... and there's also an issue with nonsynchronized clocks, so the same file can be written to when the local time is way past that, therefore you have to calculate with which timestamps to trust. In our SSB product for example we tend to use the local time now (e.g. R_DATE) because of this reason.

...

On Mon, Sep 27, 2010 at 9:24 AM, Balazs Scheidler <bazsi@balabit.hu> wrote:

...
On Mon, 2010-09-20 at 14:35 -0500, Martin Holste wrote:

...
...
commit 70e91556b6af8724334443347fd6488745405344 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Sep 20 17:12:27 2010 +0200

convertfuncs: new plugin to contain conversion template functions

The plugin now only contains ipv4-to-int which converts an IPv4 address to a long integer.

Usage:

$(ipv4-to-int $SOURCEIP)

Very cool stuff!

...
Expect a blog post on this topic, a simple correllation engine is now built into patterndb.

Hm, very interesting, I'll be taking a look.

Regarding detecting the batches being complete: It seems a little inelegant to have a baby-sitter script that looks for an appropriately named file in a given directory and hoping it's the right buffer. It would be really nice if Syslog-NG could execute program() on a file that has just been written to for the last time.

I was thinking about adding "events" to sources/destinations which could invoke 3rd party tools/scripts when something happens.

Events could be time based, but other setup/teardown style stuff can come in handy.

e.g.

destination d_file { file("/var/log/messages.$HOUR" events(cron(min(5) hour(*) exec("/usr/local/bin/messages-file-finished")); };

Not sure about the syntax though. Also I want it to be able to run processes like tail -f:

source s_follow { pipe("/var/run/syslog-ng/tail-pipe" events(startup(supervise("/usr/bin/tail -f /var/log/apache.log > /var/run/syslog-ng/tail-pipe")))); };

I know that syslog-ng is capable for tailing files, but the point is that there are sometimes complex log systems of various applications, and the only sane interface to them to run a process to tail its otherwise binary logfile. I want syslog-ng to manage these processes.

-- Bazsi

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

-- Bazsi

Martin Holste

3:35 p.m.

...

Yes. The problem with this, that there's no such thing as rollover. :( syslog-ng keeps expanding the template string to find out which files to write and then times out files that do not get written to after time_reap() seconds.

Ok, then can then code that times out files after time_reap() be the trigger? It wouldn't be immediate, but it should be soon after and guaranteed not to have follow-up data, right?

Balazs Scheidler

19 Oct 19 Oct

4:56 p.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

On Wed, 2010-09-29 at 08:35 -0500, Martin Holste wrote:

...

...
Yes. The problem with this, that there's no such thing as rollover. :( syslog-ng keeps expanding the template string to find out which files to write and then times out files that do not get written to after time_reap() seconds.

Ok, then can then code that times out files after time_reap() be the trigger? It wouldn't be immediate, but it should be soon after and guaranteed not to have follow-up data, right?

except if you have sparse log data during the interval. time_reap() may close a file within the same hour if the $HOUR macro is used in the filename. (for example 1 message every 5 minutes and time_reap() set to 60 seconds, it'll be closed quite a number of times). -- Bazsi

Balazs Scheidler

12 Sep 12 Sep

5:58 p.m.

New subject: Buffering AF_UNIX Destination, Batch Post Processing Messages

On Tue, 2010-09-07 at 19:25 -0600, syslogng@feystorm.net wrote:

...

Syslog-ng already has the exact functionality you are looking for (at least as far as I understand what youre wanting). Create a udp destination driver, set flush_timeout to 60000 (60 seconds), and flush_lines to 0 (the default). Syslog-ng will queue all the destination messages until the oldest message is 60 seconds old, and then flushes them all out at once.

The idea is great, however flush_timeout() only kicks in if flush_lines() is non-zero, but increase it to a large value. Also, I'd use a unix-dgram() destination not an udp() one to avoid losing messages. -- Bazsi

5467

Age (days ago)

5508

Last active (days ago)

List overview

Download

18 comments

4 participants

participants (4)

Balazs Scheidler
Martin Holste
Matthew Hall
syslogng＠feystorm.net