Over zealous syslog-ng problem
My Setup: I have syslog-ng running on several Solaris 8 machines. Each machine writes log messages to their local disks and also forwards the messages to a central log server via a UPD connection, also a Solaris 8 machine. The central log server stores everything into a mysql database via a perl script I wrote. My Problem: I am running the UNIX version of Cisco Secure on one of the Solaris boxes. A couple of times it has blown up. When this happens, it generates millions of log messages in a very short period. The problem is that syslog-ng logs most of these messages (I also get the mangled message problem during these heavy loads). The most recent episode generated 1,930,974 messages that made it into the mysql database and 49,573 mangled messages on the central log server, but 16,040,886 messages were written to disk on the local machine (see below). My Questions: Is there any way to throttle syslog-ng, or make syslog-ng not accept all log messages when an app goes crazy? I want to log some of these messages, so I know when to restart the service, but I don't want all 16 million. Also, it seems that around 15 million log messages didn't make it to my central server. Where were they lost? Is this a problem with the UDP transport? Aaron # cat local0.log | grep -c 'ERROR - error on accept' 16040886 jackson@auth:/tmp {5} cat sql_errors | grep -c 'INSERT INTO' 49573 mysql> delete from logs where host='acs' and facility='local0' and priority='err' and msg like '%ERROR - error on accept%'; Query OK, 1943387 rows affected (1 hour 40.16 sec)
There are a few ways to look at this problem... 1. The box sending the messages.. Do the 16,000,000 messages all have the same facility.priority? traditional syslog on solaris can only decide what to send based on facility and priority (and maybe the "tag" IIRC). So you may or may not be able to filter them at the sending side depending on whether the facility.priority of the messages is unique to what you want to filter. 2. The syslog-ng receiving the messages... Have your syslog-ng use the "match (regexp)" rule to filter out certain messages, but not others. Maybe that will work? 3. Have your perl program decide what to insert and what not to.. As far as losing the messages... Syslog-ng doesn't buffer, so if your mysql database isn't able to keep up with the flood of messages that are coming in to the pipe and from there to your perl program then syslog-ng drops them. (AFAIK) The way that I have handled this in my situation is documented at http://www.muppethouse.com/~ben/ I had syslog-ng format my incomming messages into SQL insert statements in batches by second. Then I have a program come by and pick up each batch to be inserted and delete the batch file when it finishes. This way if there is a flood of messages, they queue up in the directory and get pushed into the database ASAP until the queue is empty. -Ben. On Tue, 2002-12-31 at 11:44, Aaron Jackson wrote:
My Setup: I have syslog-ng running on several Solaris 8 machines. Each machine writes log messages to their local disks and also forwards the messages to a central log server via a UPD connection, also a Solaris 8 machine. The central log server stores everything into a mysql database via a perl script I wrote.
My Problem: I am running the UNIX version of Cisco Secure on one of the Solaris boxes. A couple of times it has blown up. When this happens, it generates millions of log messages in a very short period. The problem is that syslog-ng logs most of these messages (I also get the mangled message problem during these heavy loads). The most recent episode generated 1,930,974 messages that made it into the mysql database and 49,573 mangled messages on the central log server, but 16,040,886 messages were written to disk on the local machine (see below).
My Questions: Is there any way to throttle syslog-ng, or make syslog-ng not accept all log messages when an app goes crazy? I want to log some of these messages, so I know when to restart the service, but I don't want all 16 million. Also, it seems that around 15 million log messages didn't make it to my central server. Where were they lost? Is this a problem with the UDP transport?
Aaron
# cat local0.log | grep -c 'ERROR - error on accept' 16040886
jackson@auth:/tmp {5} cat sql_errors | grep -c 'INSERT INTO' 49573
mysql> delete from logs where host='acs' and facility='local0' and priority='err' and msg like '%ERROR - error on accept%'; Query OK, 1943387 rows affected (1 hour 40.16 sec)
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html -- Ben Russo <ben@umialumni.com>
Ben Russo wrote:
There are a few ways to look at this problem...
1. The box sending the messages.. Do the 16,000,000 messages all have the same facility.priority? traditional syslog on solaris can only decide what to send based on facility and priority (and maybe the "tag" IIRC). So you may or may not be able to filter them at the sending side depending on whether the facility.priority of the messages is unique to what you want to filter.
They are actually 16 million copies of the same message. I would like one to be recorded, but not all 16 million. If I get one, I could trigger an alarm (actually, the network monitoring people could do something if that message appears). The sending machine is running syslog-ng, so I was hoping that I could stop it from writting all the messages to local disk and sending them across the network. I suppose I could use a match rule to trigger an alarm and to filter out the messages, but the noc people may not like that.
2. The syslog-ng receiving the messages... Have your syslog-ng use the "match (regexp)" rule to filter out certain messages, but not others. Maybe that will work?
3. Have your perl program decide what to insert and what not to..
As far as losing the messages... Syslog-ng doesn't buffer, so if your mysql database isn't able to keep up with the flood of messages that are coming in to the pipe and from there to your perl program then syslog-ng drops them. (AFAIK)
The way that I have handled this in my situation is documented at http://www.muppethouse.com/~ben/
Thanks, I'll take a look.
I had syslog-ng format my incomming messages into SQL insert statements in batches by second. Then I have a program come by and pick up each batch to be inserted and delete the batch file when it finishes. This way if there is a flood of messages, they queue up in the directory and get pushed into the database ASAP until the queue is empty.
My perl script does the same thing. It basically sits asleep and checks every few seconds to see if something has been written to the pipe. If so, it reads 1 line at a time until nothing else is there and then goes back to sleep. The pipe entries are also preformatted sql statements. What is strange to me is that the sending machine seems to have no problems writting 16 million entries to disk and the receiving machine has the same syslog-ng binary and, for the most part, the same syslog-ng.conf file. So either the messages are getting lost in transport, or the perl sql inserts are not blocking and they happen too fast for mysql to deal with. To me, the latter is more troubling. Either way, my setup needs to be refined. Aaron
-Ben.
On Tue, 2002-12-31 at 11:44, Aaron Jackson wrote:
My Setup: I have syslog-ng running on several Solaris 8 machines. Each machine writes log messages to their local disks and also forwards the messages to a central log server via a UPD connection, also a Solaris 8 machine. The central log server stores everything into a mysql database via a perl script I wrote.
My Problem: I am running the UNIX version of Cisco Secure on one of the Solaris boxes. A couple of times it has blown up. When this happens, it generates millions of log messages in a very short period. The problem is that syslog-ng logs most of these messages (I also get the mangled message problem during these heavy loads). The most recent episode generated 1,930,974 messages that made it into the mysql database and 49,573 mangled messages on the central log server, but 16,040,886 messages were written to disk on the local machine (see below).
My Questions: Is there any way to throttle syslog-ng, or make syslog-ng not accept all log messages when an app goes crazy? I want to log some of these messages, so I know when to restart the service, but I don't want all 16 million. Also, it seems that around 15 million log messages didn't make it to my central server. Where were they lost? Is this a problem with the UDP transport?
Aaron
# cat local0.log | grep -c 'ERROR - error on accept' 16040886
jackson@auth:/tmp {5} cat sql_errors | grep -c 'INSERT INTO' 49573
mysql> delete from logs where host='acs' and facility='local0' and priority='err' and msg like '%ERROR - error on accept%'; Query OK, 1943387 rows affected (1 hour 40.16 sec)
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
On Tue, Dec 31, 2002 at 03:11:45PM -0500, Aaron Jackson wrote:
Ben Russo wrote:
My perl script does the same thing. It basically sits asleep and checks every few seconds to see if something has been written to the pipe. If so, it reads 1 line at a time until nothing else is there and then goes back to sleep. The pipe entries are also preformatted sql statements. What is strange to me is that the sending machine seems to have no problems writting 16 million entries to disk and the receiving machine has the same syslog-ng binary and, for the most part, the same syslog-ng.conf file. So either the messages are getting lost in transport, or the perl sql inserts are not blocking and they happen too fast for mysql to deal with. To me, the latter is more troubling. Either way, my setup needs to be refined.
I think they are lost during transit. UDP is especially lossy on bursts. syslog-ng has no 'last message repepated NNN times' feature, and though it could be implemented, it clearly clobbers messages, especially when multiple hosts generate repetition messages. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Hi,
My perl script does the same thing. It basically sits asleep and checks every few seconds to see if something has been written to the pipe. If so, it reads 1 line at a time until nothing else is there and then goes back to sleep. The pipe entries are also preformatted sql statements. What is strange to me is that the sending machine seems to have no problems writting 16 million entries to disk and the receiving machine has the same syslog-ng binary and, for the most part, the same syslog-ng.conf file. So either the messages are getting lost in transport, or the perl sql inserts are not blocking and they happen too fast for mysql to deal with. To me, the latter is more troubling. Either way, my setup needs to be refined.
My Questions: Is there any way to throttle syslog-ng, or make syslog-ng not accept all log messages when an app goes crazy? I want to log some of these messages, so I know when to restart the service, but I don't want all 16 million. Also, it seems that around 15 million log messages didn't make it to my central server. Where were they lost? Is this a problem with the UDP transport?
If you are in a LAN environment UDP is in all most cases _not_ the problem. You're losing messages in pipe() destinations? Right? Your problem is unix pipe related. A pipe has a limited 'buffer' size (4k on Linux, 5k on Solaris). It seems, that your syslogd tries to write more messages to the pipe than your script can read. If your script polls e.g. all 1 sec., its absolutly possible that syslogd tries to write more than PIPE_BUF bytes to the pipe during this time slot of 1 sec. The problem is, that the process (your script), that read(2)'s from the pipe, MUST do this at least as fast as the process (syslogd) that write(2)'s to the pipe. Otherwise the pipes 'buffer' overruns. And the writing process can't write more than PIPE_BUF bytes to the pipe and drops the msgs. If this happens, there are two solutions: Make the reading process faster or implement output queueing for the writing process. Balazs announced a patch, that fix output queueing for pipes() in libol. But this patch doesn't fix the mangled messages problem. I'll explain this in the related list thread. -- Best regards --Andreas Schulze [phone: +49.5246.80.1275, fax: +49.5246.80.2275] | I believe, it was Dennis Ritchie who said something like: | "C is rarely the best language for a given task, | but it's often the second-best". | The implication being that: "[...]" | | sh# cat>$$.c<<EOT | main(l,a,n,d)char**a;{for(d=atoi(a[1])/10*80-atoi(a[2])/5-596;n="@NK\ | ACLCCGZAAQBEAADAFaISADJABBA^SNLGAQABDAXIMBAACTBATAHDBANZcEMMCCCCAAhE\ | IJFAEAAABAfHJETBdFLDAANEfDNBPHdBcBBBEA_AL H E L L O, W O R L D! " | [l++-3];)for(;n-->64;)putchar(!d+++33^l&1);} | EOT | gcc -o$$ $$.c;clear;./$$ 52 8;rm -f $$*
On Thu, Jan 02, 2003 at 06:36:32PM +0100, Andreas Schulze wrote:
Hi,
My perl script does the same thing. It basically sits asleep and checks every few seconds to see if something has been written to the pipe. If so, it reads 1 line at a time until nothing else is there and then goes back to sleep. The pipe entries are also preformatted sql statements. What is strange to me is that the sending machine seems to have no problems writting 16 million entries to disk and the receiving machine has the same syslog-ng binary and, for the most part, the same syslog-ng.conf file. So either the messages are getting lost in transport, or the perl sql inserts are not blocking and they happen too fast for mysql to deal with. To me, the latter is more troubling. Either way, my setup needs to be refined.
My Questions: Is there any way to throttle syslog-ng, or make syslog-ng not accept all log messages when an app goes crazy? I want to log some of these messages, so I know when to restart the service, but I don't want all 16 million. Also, it seems that around 15 million log messages didn't make it to my central server. Where were they lost? Is this a problem with the UDP transport?
If you are in a LAN environment UDP is in all most cases _not_ the problem.
You're losing messages in pipe() destinations? Right?
Your problem is unix pipe related. A pipe has a limited 'buffer' size (4k on Linux, 5k on Solaris). It seems, that your syslogd tries to write more messages to the pipe than your script can read. If your script polls e.g. all 1 sec., its absolutly possible that syslogd tries to write more than PIPE_BUF bytes to the pipe during this time slot of 1 sec.
syslog-ng polls, which means that it waits until the pipe is writable, which is not when the buffer is full. and I don't think pipes would be lossy. what about: gunzip -dc somethingbig.tar.gz | tar xvf - (ok, I know this is not a named pipe, but they are internally the same)
The problem is, that the process (your script), that read(2)'s from the pipe, MUST do this at least as fast as the process (syslogd) that write(2)'s to the pipe. Otherwise the pipes 'buffer' overruns. And the writing process can't write more than PIPE_BUF bytes to the pipe and drops the msgs. If this happens, there are two solutions: Make the reading process faster or implement output queueing for the writing process.
Hmm... could you send me a reference where you read this? I doubt it is true.
Balazs announced a patch, that fix output queueing for pipes() in libol.
But this patch doesn't fix the mangled messages problem. I'll explain this in the related list thread.
I would be more than interested. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Balazs Scheidler wrote:
On Thu, Jan 02, 2003 at 06:36:32PM +0100, Andreas Schulze wrote:
syslog-ng polls, which means that it waits until the pipe is writable, which is not when the buffer is full.
Yepp. Imagine the following worst case scenario: There isn't any process that reads from the pipe. In this case you need in theory a unlimited queue size to buffer all the informations, that would never read...
and I don't think pipes would be lossy. what about: gunzip -dc somethingbig.tar.gz | tar xvf - (ok, I know this is not a named pipe, but they are internally the same)
IMHO shell pipes working BLOCKING. In this case, you are right, of course.
The problem is, that the process (your script), that read(2)'s from the pipe, MUST do this at least as fast as the process (syslogd) that write(2)'s to the pipe. Otherwise the pipes 'buffer' overruns. And the writing process can't write more than PIPE_BUF bytes to the pipe and drops the msgs. If this happens, there are two solutions: Make the reading process faster or implement output queueing for the writing process.
Sorry. I'm afraid, that I mispronounce some of the above stuff. My english isn't the best, I know. :( We have to worry about information loss in pipes, only if the writing process write()'s NONBLOCKING to this pipe. In this case the reader must at least as fast as the writer. You implement queueing, of course. But queueing means only a strategy to catch short periods of bursty message flow in this manner. If, seen over a longer period, your reader is to slow, your writer must drop messages or your writers queue size blows up. On the other hand, in theory, more than one process can write nonblocking to the same pipe. So its better to use atomic writes with buffers <= PIPE_BUF, to avoid, that the processes interleaves.
Hmm... could you send me a reference where you read this? I doubt it is true.
My favorite references are: APUE and UNP books by R.W.Stevens (the godfather himself :) and many of the linux/solaris man(2,3*,4,5,7) manpages. -- Best regards --Andreas Schulze [phone: +49.5246.80.1275, fax: +49.5246.80.2275] | I believe, it was Dennis Ritchie who said something like: | "C is rarely the best language for a given task, | but it's often the second-best". | The implication being that: "[...]" | | sh# cat>$$.c<<EOT | main(l,a,n,d)char**a;{for(d=atoi(a[1])/10*80-atoi(a[2])/5-596;n="@NK\ | ACLCCGZAAQBEAADAFaISADJABBA^SNLGAQABDAXIMBAACTBATAHDBANZcEMMCCCCAAhE\ | IJFAEAAABAfHJETBdFLDAANEfDNBPHdBcBBBEA_AL H E L L O, W O R L D! " | [l++-3];)for(;n-->64;)putchar(!d+++33^l&1);} | EOT | gcc -o$$ $$.c;clear;./$$ 52 8;rm -f $$*
On Fri, Jan 03, 2003 at 04:02:38PM +0100, Andreas Schulze wrote:
Balazs Scheidler wrote:
On Thu, Jan 02, 2003 at 06:36:32PM +0100, Andreas Schulze wrote:
syslog-ng polls, which means that it waits until the pipe is writable, which is not when the buffer is full.
Yepp. Imagine the following worst case scenario: There isn't any process that reads from the pipe. In this case you need in theory a unlimited queue size to buffer all the informations, that would never read...
Ok, I see we talked about different things. pipes as implemented by the core OS is not lossy. syslog-ng itself drops messages when the consumer of messages cannot accept messages fast enough. So it is not an OS issue, but a syslog-ng issue. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
[...]
On the other hand, in theory, more than one process can write nonblocking to the same pipe. So its better to use atomic writes with buffers <= PIPE_BUF, to avoid, that the processes interleaves.
It does not actually help in current implementation because pipe writes in syslog-ng do not preserve message boundaries. So it does not matter if OS inserts data from different writers between two syslog-ng writes or in the middle of 4K block write - in both cases great chances are at least one line is mangled.
Hmm... could you send me a reference where you read this? I doubt it is true.
My favorite references are: APUE and UNP books by R.W.Stevens (the godfather himself :) and many of the linux/solaris man(2,3*,4,5,7) manpages.
Hmm ... I prefer The Real Unix Standard :-) http://www.opengroup.org/onlinepubs/007904975/toc.htm Happy New Year to everybody -andrey
Borzenkov Andrey wrote:
On the other hand, in theory, more than one process can write nonblocking to the same pipe. So its better to use atomic writes with buffers <= PIPE_BUF, to avoid, that the processes interleaves.
It does not actually help in current implementation because pipe writes in syslog-ng do not preserve message boundaries. [...]
Yepp. That's the problem. -- Best regards --Andreas Schulze [phone: +49.5246.80.1275, fax: +49.5246.80.2275] | I believe, it was Dennis Ritchie who said something like: | "C is rarely the best language for a given task, | but it's often the second-best". | The implication being that: "[...]" | http://www.ioccc.org/1990/dds.c
On Tue, 2002-12-31 at 15:11, Aaron Jackson wrote:
Ben Russo wrote:
There are a few ways to look at this problem...
1. The box sending the messages.. Do the 16,000,000 messages all have the same facility.priority? traditional syslog on solaris can only decide what to send based on facility and priority (and maybe the "tag" IIRC). So you may or may not be able to filter them at the sending side depending on whether the facility.priority of the messages is unique to what you want to filter.
They are actually 16 million copies of the same message. I would like one to be recorded, but not all 16 million. If I get one, I could trigger an alarm (actually, the network monitoring people could do something if that message appears). The sending machine is running syslog-ng, so I was hoping that I could stop it from writting all the messages to local disk and sending them across the network. I suppose I could use a match rule to trigger an alarm and to filter out the messages, but the noc people may not like that.
Then you could have syslog-ng filter out these messages based on a match(message text) Then have those go to a pipe destination on the local box to which a logsurfer process is running (search google for logsurfer) Then you could configure logsurfer to handle the flow of the messages based on the quantity and reinsert them to the syslog-ng on the local host using logger, but with a different message text (like maybe with the number of messages received per 5 seconds?) -Ben.
participants (5)
-
Aaron Jackson
-
Andreas Schulze
-
Balazs Scheidler
-
Ben Russo
-
Borzenkov Andrey