Re: syslog-ng Digest, Vol 15, Issue 12
I am on holiday from the 17th August returning on Monday the 31st August. If your query is urgent please contact the office on +44 (0)1423 508060 or Neil Harvey on +44 (0)15395 64734 (nharvey@harbrook.net) Regards David Warnock
I have syslog-ng sending all logs to a named pipe so that a separate program can monitor and alert on log events real-time. This works great and I am very happy with the setup. However, last night my separate program that monitors the pipe died and stopped retrieving messages from the pipe. What surprises me is that for every subsequent hour after the monitoring program died, syslog-ng reported dropped messages. This is very uncommon for my installation and I suspect it has something to do with the monitoring program failing and abandoning the pipe. My question to the experts of the list is why would syslog-ng start dropping messages after a separate program reading syslog-ng's destination fifo dies? Is it just coincidence that my monitoring program died at exactly the hour I started getting dropped stats? Just for fun, here is a piece of the stat logs: Jul 19 12:05:19 watchout syslog-ng[20350]: STATS: dropped 0 Jul 19 13:05:20 watchout syslog-ng[20350]: STATS: dropped 0 Jul 19 14:05:20 watchout syslog-ng[20350]: STATS: dropped 0 Jul 19 15:05:20 watchout syslog-ng[20350]: STATS: dropped 0 Jul 19 16:05:20 watchout syslog-ng[20350]: STATS: dropped 0 Jul 19 17:05:20 watchout syslog-ng[20350]: STATS: dropped 0 Jul 19 18:05:20 watchout syslog-ng[20350]: STATS: dropped 4220 Jul 19 19:05:20 watchout syslog-ng[20350]: STATS: dropped 3187 Jul 19 20:05:21 watchout syslog-ng[20350]: STATS: dropped 2952 Jul 19 21:05:21 watchout syslog-ng[20350]: STATS: dropped 2510 Jul 19 22:05:21 watchout syslog-ng[20350]: STATS: dropped 2723 Jul 19 23:05:22 watchout syslog-ng[20350]: STATS: dropped 2903 Jul 20 00:05:22 watchout syslog-ng[20350]: STATS: dropped 2774 Jul 20 01:05:22 watchout syslog-ng[20350]: STATS: dropped 2461 As you can see, my monitoring program died sometime before 18:00 hours; or 17:33 to be exact. :) Thanks. Alex This e-mail contains Omaha Public Power District's confidential and proprietary information and is for use only by the intended recipient. Unless explicitly stated otherwise, this e-mail is not a contract offer, amendment, nor acceptance. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
My question to the experts of the list is why would syslog-ng start dropping messages after a separate program reading syslog-ng's destination fifo dies? Is it just coincidence that my monitoring program died at exactly the hour I started getting dropped stats?
<mode silly> Imagine a cardboard tube, several feet long and several inches in diameter. We give one guy a *big* basket of balls, and he starts putting the balls into one end of the tube, one by one. Meanwhile, we have another guy at the other end, taking balls out one at a time. Now this second guy decides it's time for a nap, so he stops taking balls out and tapes something across the end of the tube so balls won't fall out accidentally. Eventually, the tube gets full, and when the guy with the basket tries to put another ball in, it doesn't fit, and just gets dropped on the floor. </mode> In other words, there's a limit to how much buffering a pipe with no reader will give you before it starts complaining, and yes, it's no coincidence that it started dropping messages when the program reading the pipe went away. After all, what *else* was it supposed to do with them at that point? ;)
Great explanation! I guess I forgot that a pipe is not a file. I thought it would just store the logs until the reader woke up again. Is the pipe is limited by system resources or something else? This explains the dropped stat messages but the fifo is only one destination. Why doesn't syslog-ng drop the messages destined for destination(pipe /foo/fifo/} and continue to send messages to all the other destinations successfully? I have evidence to support that some messages to all destinations are dropped. Why? Thanks again Valdis for your most elementary clarification of FIFO-syslog-ng interaction. :) Alex -----Original Message----- From: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] On Behalf Of Valdis.Kletnieks@vt.edu Sent: Thursday, July 20, 2006 9:28 AM To: Syslog-ng users' and developers' mailing list Subject: Re: [syslog-ng] Syslog-ng to Pipe question
My question to the experts of the list is why would syslog-ng start dropping messages after a separate program reading syslog-ng's destination fifo dies? Is it just coincidence that my monitoring program died at exactly the hour I started getting dropped stats?
<mode silly> Imagine a cardboard tube, several feet long and several inches in diameter. We give one guy a *big* basket of balls, and he starts putting the balls into one end of the tube, one by one. Meanwhile, we have another guy at the other end, taking balls out one at a time. Now this second guy decides it's time for a nap, so he stops taking balls out and tapes something across the end of the tube so balls won't fall out accidentally. Eventually, the tube gets full, and when the guy with the basket tries to put another ball in, it doesn't fit, and just gets dropped on the floor. </mode> In other words, there's a limit to how much buffering a pipe with no reader will give you before it starts complaining, and yes, it's no coincidence that it started dropping messages when the program reading the pipe went away. After all, what *else* was it supposed to do with them at that point? ;) This e-mail contains Omaha Public Power District's confidential and proprietary information and is for use only by the intended recipient. Unless explicitly stated otherwise, this e-mail is not a contract offer, amendment, nor acceptance. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
On Thu, 20 Jul 2006 09:57:44 CDT, "SOLIS, ALEX" said:
I guess I forgot that a pipe is not a file. I thought it would just store the logs until the reader woke up again. Is the pipe is limited by system resources or something else?
The basic problem is that all the buffered data ends up sitting in memory, and usually in locked, unswappable pages. As a result, if you let a pipe have a 128K buffer, then each pipe that gets stuck can potentially nail down 128K of RAM, causing extreme memory pressure. (As an aside, making these pages swappable is often harder than you think, which is why they end up being nailed-down pages). In the Linux 2.16.18-rc1-mm1 kernel I'm running now, a writer to a pipe is allowed up to 16 4K pages (or a total of 64K) of writes before it blocks waiting for a existing reader to actually accept data - and if the reader goes away entirely, it generates an immediate SIGPIPE.
This explains the dropped stat messages but the fifo is only one destination. Why doesn't syslog-ng drop the messages destined for destination(pipe /foo/fifo/} and continue to send messages to all the other destinations successfully? I have evidence to support that some messages to all destinations are dropped. Why?
It *should* send to other destinations OK. I'm pretty sure that "dropped" gets incremented if *any* destination fails (so if 5 success, one fails, that's still a dropped++). What evidence do you have that some are being dropped entirely?
Could you elaborate on the SIGPIPE? Is there a way I can generate one if my reader goes away with my current kernel (2.6.10-gentoo)? It seems to me that I am going to need to implement some sort of solution and I think killing the PIPE until administrator intervention is a good one. The evidence I have is this: I have about 25 servers that send a static log entry every 30 minutes to my loghost. These static log entries are sent to their own destination file. This file has one of the static messages missing which indicates it was dropped. Additionally, the static entry *does* exist at the server's local log file so it was generated and sent from the remote server. Syslog-ng simply did not record it....Dropped. I recognize it is possible that since I am using UDP as a transport, syslog-ng might have never received the message. I consider this highly unlikely since I have not experience a UDP related missed syslog message in over a year. Thanks so much for your help Valdis! Maybe you can suggest a solution for me? I cant afford to drop messages if my reader dies. I need a way to either kill the pipe, extend the pipe, or tell syslog-ng to do something. Re-starting the reader is not an option because it most probably terminated due to an alerting flood, which we want to avoid. Alex -----Original Message----- From: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] On Behalf Of Valdis.Kletnieks@vt.edu Sent: Thursday, July 20, 2006 10:29 AM To: Syslog-ng users' and developers' mailing list Subject: Re: [syslog-ng] Syslog-ng to Pipe question On Thu, 20 Jul 2006 09:57:44 CDT, "SOLIS, ALEX" said:
I guess I forgot that a pipe is not a file. I thought it would just store the logs until the reader woke up again. Is the pipe is limited by system resources or something else?
The basic problem is that all the buffered data ends up sitting in memory, and usually in locked, unswappable pages. As a result, if you let a pipe have a 128K buffer, then each pipe that gets stuck can potentially nail down 128K of RAM, causing extreme memory pressure. (As an aside, making these pages swappable is often harder than you think, which is why they end up being nailed-down pages). In the Linux 2.16.18-rc1-mm1 kernel I'm running now, a writer to a pipe is allowed up to 16 4K pages (or a total of 64K) of writes before it blocks waiting for a existing reader to actually accept data - and if the reader goes away entirely, it generates an immediate SIGPIPE.
This explains the dropped stat messages but the fifo is only one destination. Why doesn't syslog-ng drop the messages destined for destination(pipe /foo/fifo/} and continue to send messages to all the other destinations successfully? I have evidence to support that some messages to all destinations are dropped. Why?
It *should* send to other destinations OK. I'm pretty sure that "dropped" gets incremented if *any* destination fails (so if 5 success, one fails, that's still a dropped++). What evidence do you have that some are being dropped entirely? This e-mail contains Omaha Public Power District's confidential and proprietary information and is for use only by the intended recipient. Unless explicitly stated otherwise, this e-mail is not a contract offer, amendment, nor acceptance. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
On Thu, 20 Jul 2006 11:08:49 CDT, "SOLIS, ALEX" said:
Could you elaborate on the SIGPIPE? Is there a way I can generate one if my reader goes away with my current kernel (2.6.10-gentoo)? It seems to me that I am going to need to implement some sort of solution and I think killing the PIPE until administrator intervention is a good one.
Actually, it isn't something you need to generate - your reader program crashes, the kernel notices, and hands the process at the other end (syslog-ng) a SIGPIPE free of charge.
I recognize it is possible that since I am using UDP as a transport, syslog-ng might have never received the message. I consider this highly unlikely since I have not experience a UDP related missed syslog message in over a year.
Hold that thought for a moment... :)
Thanks so much for your help Valdis! Maybe you can suggest a solution for me? I cant afford to drop messages if my reader dies. I need a way to either kill the pipe, extend the pipe, or tell syslog-ng to do something. Re-starting the reader is not an option because it most probably terminated due to an alerting flood, which we want to avoid.
(I'll skip over "fix the reader so it doesn't crash" as "too obvious" :) On the other hand, you *do* eventually need to re-start that reader, or you won't get any further data read. And presumably, you want that data, or you wouldn't have started that destination in the first place.... Getting back to that held thought... If you're in the muddle of a flood of messages, it *is* possible to have a UDP packet dropped (and if anything, that's the time it's *most* likely to happen). One possibility is to set the destination up as a *named pipe*, as follows: 1) mknod p /tmp/your_pipe_here 2) specify /tmp/your_pipe_here as a destination *FILE* for syslog-ng (yes, this will work...) 3) Someplace else, start your reader program with the pipe as standard *input*: /usr/local/sbin/rdr_prog < /tmp/your_pipe_here If the reader goes away, syslog-ng will get the usual 'unable to write' results back, and retry opening the file(pipe) every minute or so. Then in a cron job that runs every few minutes, you can do something like: #!/bin/bash if ! killall -0 rdr_prog then /usr/local/sbin/rdr_prog < /tmp/your_pipe_here > /dev/console 2>&1 & fi And with any luck at all, the message flood that killed your reader will be over by the time the cron job runs, so you only miss a few minutes...
Sounds good to me. Ill give that a try. Like I said, the reader program is designed to terminate if a certain number of alerts arrive in a certain period of time. If the reader program trips off, I would like to keep it that way until a human can interpret the conditions and prevent syslog-ng from dropping messages. I think first though I need to verify that messages to other destination were/are actually getting dropped. I think I saw somewhere in the list that syslog-ng 2.0 has a new more verbose stats messages, is this true? If it is I think I will upgrade so testing will yield more information; particularly which destination is dropping messages. :) Thanks for the Advice!!! Alex -----Original Message----- From: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] On Behalf Of Valdis.Kletnieks@vt.edu Sent: Thursday, July 20, 2006 11:30 AM To: Syslog-ng users' and developers' mailing list Subject: Re: [syslog-ng] Syslog-ng to Pipe question On Thu, 20 Jul 2006 11:08:49 CDT, "SOLIS, ALEX" said:
Could you elaborate on the SIGPIPE? Is there a way I can generate one if my reader goes away with my current kernel (2.6.10-gentoo)? It seems to me that I am going to need to implement some sort of solution and I think killing the PIPE until administrator intervention is a good one.
Actually, it isn't something you need to generate - your reader program crashes, the kernel notices, and hands the process at the other end (syslog-ng) a SIGPIPE free of charge.
I recognize it is possible that since I am using UDP as a transport, syslog-ng might have never received the message. I consider this highly unlikely since I have not experience a UDP related missed syslog message in over a year.
Hold that thought for a moment... :)
Thanks so much for your help Valdis! Maybe you can suggest a solution for me? I cant afford to drop messages if my reader dies. I need a way to either kill the pipe, extend the pipe, or tell syslog-ng to do something. Re-starting the reader is not an option because it most probably terminated due to an alerting flood, which we want to avoid.
(I'll skip over "fix the reader so it doesn't crash" as "too obvious" :) On the other hand, you *do* eventually need to re-start that reader, or you won't get any further data read. And presumably, you want that data, or you wouldn't have started that destination in the first place.... Getting back to that held thought... If you're in the muddle of a flood of messages, it *is* possible to have a UDP packet dropped (and if anything, that's the time it's *most* likely to happen). One possibility is to set the destination up as a *named pipe*, as follows: 1) mknod p /tmp/your_pipe_here 2) specify /tmp/your_pipe_here as a destination *FILE* for syslog-ng (yes, this will work...) 3) Someplace else, start your reader program with the pipe as standard *input*: /usr/local/sbin/rdr_prog < /tmp/your_pipe_here If the reader goes away, syslog-ng will get the usual 'unable to write' results back, and retry opening the file(pipe) every minute or so. Then in a cron job that runs every few minutes, you can do something like: #!/bin/bash if ! killall -0 rdr_prog then /usr/local/sbin/rdr_prog < /tmp/your_pipe_here > /dev/console 2>&1 & fi And with any luck at all, the message flood that killed your reader will be over by the time the cron job runs, so you only miss a few minutes... This e-mail contains Omaha Public Power District's confidential and proprietary information and is for use only by the intended recipient. Unless explicitly stated otherwise, this e-mail is not a contract offer, amendment, nor acceptance. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
As of nov 27 snapshot, the dnscache.c file is missing the include of sys/socket.h which is required on AIX for the definitions of AF_INET, AF_INET6 ... Additionally, the logmsg.c has errors that prevent compilation. source='logmsg.c' object='logmsg.o' libtool=no depfile='.deps/logmsg.Po' tmpdepfile='.deps/logmsg.TPo' depmode=aix /bin/sh ../depcomp xlc -DHAVE_CONFIG_H -I. -I. -I.. -D_ALL_SOURCE -I/usr/local/include -I/usr/include -I/usr/local/include/glib-2.0 -I/usr/local/lib/glib-2.0/include -I/usr/local/include/eventlog -D_GNU_SOURCE -D_ALL_SOURCE -I/usr/local/include -I/usr/include -g -c `test -f 'logmsg.c' || echo './'`logmsg.c "logmsg.c", line 1.1: 1506-137 (E) Declaration must declare at least one declarator, tag, or the members of an enumeration. "logmsg.c", line 148.7: 1506-068 (E) Operation between types "unsigned char*" and "char*" is not allowed. "logmsg.c", line 211.39: 1506-280 (E) Function argument assignment between types "char*" and "unsigned char*" is not allowed. "logmsg.c", line 219.9: 1506-068 (E) Operation between types "unsigned char*" and "char*" is not allowed. "logmsg.c", line 237.51: 1506-280 (E) Function argument assignment between types "const char*" and "unsigned char*" is not allowed. "logmsg.c", line 276.39: 1506-280 (E) Function argument assignment between types "char*" and "unsigned char*" is not allowed. "logmsg.c", line 283.10: 1506-068 (S) Operation between types "struct tm" and "struct tm*" is not allowed. "logmsg.c", line 314.18: 1506-068 (E) Operation between types "char*" and "unsigned char*" is not allowed. "logmsg.c", line 318.26: 1506-068 (E) Operation between types "char*" and "unsigned char*" is not allowed. "logmsg.c", line 349.18: 1506-068 (E) Operation between types "char*" and "unsigned char*" is not allowed. "logmsg.c", line 383.19: 1506-068 (E) Operation between types "unsigned char*" and "char*" is not allowed. "logmsg.c", line 394.18: 1506-068 (E) Operation between types "char*" and "unsigned char*" is not allowed. "logmsg.c", line 406.15: 1506-068 (E) Operation between types "unsigned char*" and "char*" is not allowed. "logmsg.c", line 418.14: 1506-068 (E) Operation between types "char*" and "unsigned char*" is not allowed. "logmsg.c", line 440.15: 1506-068 (E) Operation between types "unsigned char*" and "char*" is not allowed. "logmsg.c", line 445.15: 1506-068 (E) Operation between types "char*" and "unsigned char*" is not allowed. "logmsg.c", line 450.34: 1506-280 (E) Function argument assignment between types "char*" and "unsigned char*" is not allowed. make: 1254-004 The error code from the last command is 1.
On Thu, 2006-07-20 at 11:29 -0400, Valdis.Kletnieks@vt.edu wrote:
On Thu, 20 Jul 2006 09:57:44 CDT, "SOLIS, ALEX" said:
I guess I forgot that a pipe is not a file. I thought it would just store the logs until the reader woke up again. Is the pipe is limited by system resources or something else?
The basic problem is that all the buffered data ends up sitting in memory, and usually in locked, unswappable pages. As a result, if you let a pipe have a 128K buffer, then each pipe that gets stuck can potentially nail down 128K of RAM, causing extreme memory pressure. (As an aside, making these pages swappable is often harder than you think, which is why they end up being nailed-down pages).
In the Linux 2.16.18-rc1-mm1 kernel I'm running now, a writer to a pipe is allowed up to 16 4K pages (or a total of 64K) of writes before it blocks waiting for a existing reader to actually accept data - and if the reader goes away entirely, it generates an immediate SIGPIPE.
Just one note, when using the pipe() destination in syslog-ng, it will not generate a SIGPIPE as syslog-ng opens its pipe destination in read-write mode, e.g. there's guaranteed to be one writer left: syslog-ng itself. If it is a program() destination, then you are right, if the program goes away, syslog-ng gets a SIGPIPE, and the destination is closed. (in syslog-ng 2.0, it is restarted in this case) syslog-ng should however write messages to all other destinations, and the dropped counter is global, and one last bit: file destinations never fail to write messages, unless a write error occurs (e.g. out of disk space). -- Bazsi
participants (5)
-
Balazs Scheidler
-
dwarnock@harbrook.net
-
Evan Rempel
-
SOLIS, ALEX
-
Valdis.Kletnieks@vt.edu