Hi, As you might have noticed, I'm quite unresponsive these days. The reason for this is I was in hospital, and I have to go back this afternoon. I expect to be there for at least a week. I'll try to answer all messages as soon as I return. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Balazs Scheidler writes:
As you might have noticed, I'm quite unresponsive these days. The reason for this is I was in hospital, and I have to go back this afternoon. I expect to be there for at least a week. I'll try to answer all messages as soon as I return.
I hope all goes well and best wishes for a speedy recovery. We'll try to fix our own bugs in the meantime ... -- Ed Ed Ravin | "The way you tell the good socialists from the bad ones is eravin@ | that the good ones ride bikes." panix.com | ---- M.J. Smith |
I have an idea for a filter function: I would like to filter messages through an external program, ideally spawned as a child process like the 'program()' destination target. This could serve 2 functions: 1. Perform more complex filtering, or dynamic filtering based on input from a database or other 3rd source. 2. Reformat the messages as they pass through the filter. A message could be sent to the external process via STDIN, it is evaluated and conditionally sent back via STDOUT. If a message is to be dropped according to the filter criteria of the external process, it would have to sendback a NULL character to syslog-ng as an indicator. A process could send an excessively verbose or obscure message to syslog- ng, and the external filter process could re-write the message and pass it back to syslog-ng for continued routing and handling. This would be extremely handy for handling syslog input from Windows NT/2K event logs. For example, right now, events from my Windows boxes start like this in syslog: Dec 4 16:04:02 host 275891:Tue Dec 04 16:04:02 2001: host/process... The message itself contains the time stamp, host name, and process; and the process name supplied to syslog-ng is actually the event ID in the event log. There is no way to modify the output on the Windows end, but I would like to rewrite the message on my end. Also the event log output can be quite verbose: "A trusted logon process has registered with the Local Security Authority. This logon process will be trusted to submit logon requests. Logon Process name:... " This could easily be condensed by either hard coding a translation, or writinga filter to drop extraneous words: 'a, the', and abbreviating others: 'proc., w/,req.'. I realize that you could set up a program as a destination, and have that program filter and format and re-send those messages to syslog, but that seems cumbersome, and also could potentially double the processing that syslog-ng would have to do. Comments? P.S.: For the curious: I'm using Adiscon Event Reporter http://www.eventreporter.com/ Having evaluated a few event log/syslog wedges, this is the best.
Jay Guerette on Tue, Dec 04, 2001 at 04:28:24PM -0500: Jay,
A message could be sent to the external process via STDIN, it is evaluated and conditionally sent back via STDOUT. If a message is to be dropped according to the filter criteria of the external process, it would have to sendback a NULL character to syslog-ng as an indicator. [..] I realize that you could set up a program as a destination, and have that program filter and format and re-send those messages to syslog, but that seems cumbersome, and also could potentially double the processing that syslog-ng would have to do.
I don't see much difference vs. setting up a FIFO source and a program destination. To avoid overhead, you can direct input from the FIFO source directly to its destination. Have your external filter send out- put to the pipe and there you go. Regards, -- ____ ____ / _/| - > Gregor Binder <gb@(rootnexus.net|sysfive.com)> | / || _\ \ \__ Id: 0xE2F31C4B Fp: 8B8A 5CE3 B79B FBF1 5518 8871 0EFB AFA3 E2F3 1C4B
I realize that you could set up a program as a destination, and have that program filter and format and re-send those messages to syslog, but that seems cumbersome, and also could potentially double the processing that syslog-ng would have to do.
I don't see much difference vs. setting up a FIFO source and a program destination. To avoid overhead, you can direct input from the FIFO source directly to its destination. Have your external filter send out- put to the pipe and there you go.
I don't understand what you're suggeting here. I have no flexibility for source;it's good old UDP syslog or nothing. Are you saying take the UDP in, send it toa program, which writes it to a FIFO, that syslog-ng reads, and writes to a log? I'm confused...
Jay Guerette on Thu, Dec 06, 2001 at 01:22:58AM -0500: Hi,
I don't see much difference vs. setting up a FIFO source and a program destination. To avoid overhead, you can direct input from the FIFO source directly to its destination. Have your external filter send out- put to the pipe and there you go.
I don't understand what you're suggeting here. I have no flexibility for source;it's good old UDP syslog or nothing. Are you saying take the UDP in, send it toa program, which writes it to a FIFO, that syslog-ng reads, and writes to a log? I'm confused...
okay, I admit I should have allocated more than one paragraph for the explanation :) You would need two source definitions, one for "regular" message trans- port to syslog-ng (would probably contain internal, /dev/log, etc. and your network port(s)), and the other one for messages that are fed back to syslog-ng (which IMO should be a FIFO, because that gives you more flexibility when choosing a language to implement your external filter programs). The log statement for the first source would then probably be so that all syslog-ng filters ("internal" filters in that case) would be applied to the log message, while the log statement for the second source definition could possibly be without any filters, and directly going to a specific log. What I was trying to say was: Since you can specify multiple sources, and also which filters get applied to what source, you can get the functionality you ask for without IMHO having too much overhead vs. the solution you suggest. You'd obviously have to optimize the configuration suitably to your environment. I hope I made myself clear this time .. :) -- ____ ____ / _/| - > Gregor Binder <gb@(rootnexus.net|sysfive.com)> | / || _\ \ \__ Id: 0xE2F31C4B Fp: 8B8A 5CE3 B79B FBF1 5518 8871 0EFB AFA3 E2F3 1C4B
You would need two source definitions, one for "regular" message trans- port to syslog-ng (would probably contain internal, /dev/log, etc. and your network port(s)), and the other one for messages that are fed back to syslog-ng (which IMO should be a FIFO, because that gives you more flexibility when choosing a language to implement your external filter programs).
The log statement for the first source would then probably be so that all syslog-ng filters ("internal" filters in that case) would be applied to the log message, while the log statement for the second source definition could possibly be without any filters, and directly going to a specific log.
Much better explanation :) The only problem with this implementation is the double hit for every message.I have 53 hosts that generated 2.5 million log entries yesterday; so having a 'loopback' filter could theoretically throw up to 5 million at syslog-ng. I don'twant to overload the system. Also, when an 'event' happens, and a cluster of boxes all start logging furiously for a few seconds, I think that the burst * 2will temporarily overwhelm it. Meanwhile, an 'inline external'(?) filter would obviously eliminate the double hit,and in my specific case, could help reduce the load by re-writing the messages on the fly and doing a language compression.
Jay Guerette on Thu, Dec 06, 2001 at 08:32:33AM -0500: Jay,
The only problem with this implementation is the double hit for every message.I have 53 hosts that generated 2.5 million log entries yesterday; so having a 'loopback' filter could theoretically throw up to 5 million at syslog-ng. I don'twant to overload the system. Also, when an 'event' happens, and a cluster of boxes all start logging furiously for a few seconds, I think that the burst * 2will temporarily overwhelm it.
are you sure? Actually, one would have to look at the source. But the way I see it, there can't be MUCH difference given the configuration is optimized, since reading from a FIFO is not much different than reading program output performancewise. Obviously the message re-enters syslog- ng message processing, but I'd guess this path is short if the log statements are not too exciting (at least not two times the processing). I believe context switching between possibly multiple filter processes with this sort of relatively slow IPC will already cost you so much that the additional overhead produced by this missing feature is neglectible. I admit this is just a feeling :) To really save cycles, message rewriting should be inline, I think there even might be something like this in the development release? I don't know exactly, I believe there was some discussion about this, but I only use production releases. Somebody else? As another alternative: Have you thought about passing the intended filename as an argument to your program() destination and write directly from the filter program? That should be just about the same thing as the inline filtering. Obviously you'd have to implement sync() et al yourself. Regards, -- ____ ____ / _/| - > Gregor Binder <gb@(rootnexus.net|sysfive.com)> | / || _\ \ \__ Id: 0xE2F31C4B Fp: 8B8A 5CE3 B79B FBF1 5518 8871 0EFB AFA3 E2F3 1C4B
On Thu, Dec 06, 2001 at 01:30:15PM +0100, Gregor Binder wrote:
Jay Guerette on Thu, Dec 06, 2001 at 01:22:58AM -0500: You would need two source definitions, one for "regular" message trans- port to syslog-ng (would probably contain internal, /dev/log, etc. and your network port(s)), and the other one for messages that are fed back to syslog-ng (which IMO should be a FIFO, because that gives you more flexibility when choosing a language to implement your external filter programs).
The log statement for the first source would then probably be so that all syslog-ng filters ("internal" filters in that case) would be applied to the log message, while the log statement for the second source definition could possibly be without any filters, and directly going to a specific log.
What I was trying to say was: Since you can specify multiple sources, and also which filters get applied to what source, you can get the functionality you ask for without IMHO having too much overhead vs. the solution you suggest. You'd obviously have to optimize the configuration suitably to your environment.
I hope I made myself clear this time .. :)
Pardon me while the mail admin within slips out ;) What you are descibing is not too dissimilar from Milter, the mail filter API in Sendmail. You would write programs that listen on a AF_UNIX or AF_INET socket, sendmail connects and talks over the socket and your filter says "accect", "continue filtering", "reject but try again later", and "reject". Now this works fine for email, as the message is large and overhead, while a concern, is not that big a deal. In this case you are using a pipe file, not a socket, but there is still the issue syslog-ng seeing some messages twice. A modification of your idea would get around this however. If you split the tasks of filtering, receiving a message, and writing as message we can setup a pipeline (and allow scalling accross CPUs). This should give you an model that can have an arbitrary number of filters, as filters could take input and give output to other filters. It would also give the speed concious the ability to have no filters at all (beyond the status quo in syslog-ng maybe). Another advantage to this approach is your message writing (or delivering) program/process could take any form you desired. Files, network sockects, database, and pagers could all have an output program. The downside of this is filtering for different output "devices" would be difficult (basic priority/facilty/program mappings would be easy, more exotic would not (or atleast I can think of how to do so easy this close to going to the bar)). -------- My brain informs me that perhaps instead of the same syslog-ng process seeing each message, you could do as I suggest with syslog-ng as the lister and the delivery agent connected via pipes that snake through your filter programs. This would let you throw CPU power at the problem, but so would threading I suppose. -------- All of this hinges on your OS scaling well in a multiprocessor environment. Linux 2.4 isn't bad, but it still kinda sucks. I have a feeling that with this and other solutions using pipes bottle necks will form at kernel. For those running Solaris and Tru64 (maybe AIX and IRIX) this will be less of a problem. I shan't comment on *BSD. Sockets might also scale better despite their appearance of sloth (setup time only mostly). I have a feeling that in high load environment that need fancy filtering to happen real time (and not offline which removes any need for making syslog-ng any more complicated than it is), threads and plugins (shared object libraries are likely the best choice) are the way to go. It would let more than one CPU have a crack at filtering and message processing, and it would not rely on the kernel to play nice with pipes. Ah well, that's my two cents. ---------------------------------------------------------------------------- __o Bradley Arlt Security Team Lead _ \<_ arlt@cpsc.ucalgary.ca University Of Calgary (_)/(_) http://pages.cpsc.ucalgary.ca/~arlt/ Computer Science
begin Jay Guerette quotation of Tue, Dec 04, 2001 at 04:28:24PM -0500:
I have an idea for a filter function:
I would like to filter messages through an external program, ideally spawned as a child process like the 'program()' destination target.
This could serve 2 functions: 1. Perform more complex filtering, or dynamic filtering based on input from a database or other 3rd source. 2. Reformat the messages as they pass through the filter.
A message could be sent to the external process via STDIN, it is evaluated and conditionally sent back via STDOUT. If a message is to be dropped according to the filter criteria of the external process, it would have to sendback a NULL character to syslog-ng as an indicator.
Of course we would like to have PCRE at our disposal for this filtering, if we're going to go to the trouble to implement this. Just thought I'd mention that. -- Nate Campi http://www.campin.net GnuPG key: 0xC17AEF79 Key fingerprint = BF12 722F 8799 E614 33CC FAB7 5A90 C464 C17A EF79 The three Rs of Microsoft support: Retry, Reboot, Reinstall.
I would like to filter messages through an external program, ideally spawned as a child process like the 'program()' destination target.
This could serve 2 functions: 1. Perform more complex filtering, or dynamic filtering based on input from a database or other 3rd source. 2. Reformat the messages as they pass through the filter.
A message could be sent to the external process via STDIN, it is evaluated and conditionally sent back via STDOUT. If a message is to be dropped according to the filter criteria of the external process, it would have to sendback a NULL character to syslog-ng as an indicator.
Of course we would like to have PCRE at our disposal for this filtering, if we're going to go to the trouble to implement this.
My thought was, keep syslog-ng focused on it's core functionality. "Do one thing and do it well". It's current filtering ability is great, and makes sensewithin the scope of the program. Yes, adding PCRE would be incredibly cool, and by modularizing that ability in this way, keeps the core clean and light. BTW, when thinking of external filtering programs, I am ONLY thinking about Perl.... what else would you use!
On Tue, Dec 04, 2001 at 04:28:24PM -0500, Jay Guerette wrote:
I have an idea for a filter function:
I would like to filter messages through an external program, ideally spawned as a child process like the 'program()' destination target.
1. Perform more complex filtering, or dynamic filtering based on input from a database or other 3rd source. 2. Reformat the messages as they pass through the filter.
Where did we leave off with this? I have a very real need myself to be able to rewrite certain log messages. My reporting and archiving both get messed up by incorrect hostnames, mostly from solaris clients (which don't seem to send a hostname in network syslog messages but do include the rest of the syslog header) and the tag/process field has a space in it. This makes syslog-ng think that the first part of the tag field is the hostname (correct behavior for syslog-ng, but still wrong in this case). I could make syslog-ng toss the client supplied hostname entirely (keep_hostname(no)), but then I lose half of the tag field, which I need to keep the message intact. Archiving and reporting problems also happen when a "last message repeated XX times" message comes in. I'd rather the messages were recorded correctly in the first place - that seems the right way to do this, rather than coding in a bunch of workarounds for all tools which parse/utilize the messages. I ended up writing a perl daemon sitting in front of syslog-ng to fix these messages before syslog-ng even sees them, but this is no solution. I feel no desire to re-implement the proper "relay" behavior described in http://www.ietf.org/rfc/rfc3164.txt - which I really need to do to get this working right. I think Balazs might have been in the hospital when this thread came up (BTW, hope you're well). Some kind of rewriting ability would be great, any thoughts Balazs? OBTW, I filed a support ticket with the vendor of the software which sends the space in the tag field, but even if they fix it (not anytime soon) something like this will come up again, I'm sure. -- Nate Campi http://www.campin.net GnuPG key: 0xC17AEF79 "If Microsoft can change and compete on quality, I've won." -- L. Torvalds
participants (6)
-
Balazs Scheidler
-
Brad Arlt
-
Ed Ravin
-
Gregor Binder
-
Jay Guerette
-
Nate Campi