[syslog-ng] pipe causing lockup?
Balazs Scheidler
bazsi at balabit.hu
Sun Dec 6 16:04:40 CET 2009
On Sat, 2009-12-05 at 18:09 -0600, Tim Rupp wrote:
> Hi list,
>
> I'm hoping someone can explain this to me because it's taken several
> days of strace'ing and debugging for me to draw this conclusion.
>
> I have a source that right now is receiving a relay of syslog from
> another syslog-ng server, version 1.6.11. The new syslog-ng server is
> 3.0.4 using the RHEL 5 amd64 rpm.
>
> The new server has the following source to receive from the old server
>
> source s_net {
> udp(
> ip(0.0.0.0) port(514)
> );
> };
>
>
> I was seeing similar problems as described by Mike in this discussion
>
> https://lists.balabit.hu/pipermail/syslog-ng/2009-September/013371.html
>
>
> I strace'd the syslog-ng process, would wait for it to die, and then
> attempted to see if there were any patterns. In each strace syslog-ng
> appeared to hand right after processing a write to a fifo; the relevant
> strace output is below.
>
> 25461 write(340, "Dec 3 22:32:32 cmsstor52 sshd[28272]: Authorized to
> root, krb5 principal host/cmssrv28.fnal.gov at FNAL.GOV (krb5_kuserok)\n",
> 121) = 121
> 25461 poll([{fd=39, events=POLLOUT}, {fd=6, events=POLLIN}, {fd=3,
> events=POLLIN}, {fd=10, events=POLLIN}, {fd=5, events=POLLIN}, {fd=8,
> events=POLLIN}, {fd=
> 9, events=POLLIN}], 7, 29155) = 1 ([{fd=9, revents=POLLIN}])
> 25461 recvfrom(9, "<14>Dec 3 22:32:30 fndhcp1 dhcp_checkd[1679]: [ID
> 702911 user.info] -- DHCP Watchdog reports: All DHCP registration
> processes are runni
> ng --\n", 8192, 0, {sa_family=AF_INET, sin_port=htons(46911),
> sin_addr=inet_addr("131.225.110.121")}, [16]) = 144
> 25461 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
> 25461 poll([{fd=39, events=POLLOUT}], 1, 0) = 0 (Timeout)
> 25461 lseek(14, 0, SEEK_END) = 1212423178
> 25461 write(14, "Dec 3 22:32:30 fndhcp1 dhcp_checkd[1679]: [ID 702911
> user.info] -- DHCP Watchdog reports: All DHCP registration processes
> are running --\
> n", 140) = 140
> 25461 lseek(38, 0, SEEK_END) = 641655
> 25461 write(38, "Dec 3 22:32:30 fndhcp1 dhcp_checkd[1679]: [ID 702911
> user.info] -- DHCP Watchdog reports: All DHCP registration processes
> are running --\
> n", 140) = 140
> 25461 lseek(40, 0, SEEK_END) = 641655
> 25461 write(40, "Dec 3 22:32:30 fndhcp1 dhcp_checkd[1679]: [ID 702911
> user.info] -- DHCP Watchdog reports: All DHCP registration processes
> are running --\
> n", 140) = 140
> 25461 lseek(41, 0, SEEK_END) = 641655
> 25461 write(41, "Dec 3 22:32:30 fndhcp1 dhcp_checkd[1679]: [ID 702911
> user.info] -- DHCP Watchdog reports: All DHCP registration processes
> are running --\
> n", 140) = 140
>
>
>
> fd 9 is the UDP socket.
>
> Right after those messages, it would do a bunch of close() calls
>
> ...
> 25461 close(203) = 0
> 25461 close(150) = 0
> 25461 close(565) = 0
> 25461 close(637) = 0
> 25461 close(517) = 0
> ...
>
>
>
> followed by a poll that, this time, didn't include the UDP socket
>
>
> 25461 poll([{fd=39, events=POLLOUT}, {fd=6, events=POLLIN}, {fd=3,
> events=POLLIN}, {fd=10, events=POLLIN}, {fd=5, events=POLLIN}, {fd=8,
> events=POLLIN}], 6,
> 0) = 0 (Timeout)
> 25461 poll([{fd=39, events=POLLOUT}], 1, 0) = 0 (Timeout)
> 25461 lseek(11, 0, SEEK_END) = 107767627
> 25461 write(11, "Dec 3 22:33:01 clogged syslog-ng[25461]: Reaping
> unused destination files;
> template='/logging/syslog-ng/$HOST/messages'\n", 121) = 121
> 25461 lseek(12, 0, SEEK_END) = 105497311
>
>
> I don't have flow-control enabled in my config.
>
>
>
> After several days of head scratching and reading, I narrowed it down to
> having something to do with my DHCP destinations/filters/logs
>
> I have a pipe() destination that points to a fifo
>
> destination d_cstdb { pipe("/opt/dhcp_syslog_cst_db"); };
>
> a filter
>
> filter f_dhcp { facility(local4,user) and host("fndhcp1*"); };
>
> and the corresponding log statement
>
> log { source(s_net); filter(f_dhcp); destination(d_cstdb); };
>
>
>
> I've concluded that the problem is caused by the pipe/fifo.
>
> On the old server I have something reading from the fifo. On the new
> server I hadn't stood that process up yet.
>
> syslog-ng would start just fine, would even write several entries to
> that fifo, and then would reach a point where things would hang. It
> would stop reading from the UDP socket, and just log things like
> destinations expiring.
>
> I looked online at the documentation for FIFOs and linux and noted that
> 3.0.4 source appears to specify that fifo's be opened non-blocking.
>
> So forgive the silly question, but if the fifo is opened non-blocking,
> why is syslog-ng appearing to block when writing to a fifo that isn't
> being read from?
>
> Also, why would the UDP socket file descriptor be removed from the poll
> list? An lsof of the syslog-ng process clearly showed that it was still
> bound to *:syslog
>
> Is this is bug or normal operation? Maybe I don't understand linux's
> behavior in regards to fifos.
On the top of my head, do you have any other flags on any of the
affected log statements? We've found a bug which triggers enabled
flow-control if only the "final" flag is enabled on one of the affected
log statements.
This is the patch which fixes this problem:
diff --git a/src/logmpx.c b/src/logmpx.c
index e7d5bbb..3d34f5f 100644
--- a/src/logmpx.c
+++ b/src/logmpx.c
@@ -29,7 +29,7 @@ log_multiplexer_add_next_hop(LogMultiplexer *self, LogPipe *next_hop)
{
/* NOTE: this flag is currently unused. it'll be used to tell whether we can process the multiplexed pipes in parallel threads. */
if (next_hop->flags & (PIF_FINAL + PIF_FALLBACK))
- self->super.flags &= PIF_MPX_INDEP_PATHS;
+ self->super.flags &= ~PIF_MPX_INDEP_PATHS;
g_ptr_array_add(self->next_hops, next_hop);
}
Can you confirm whether you have "final" flag, or whether this patch
solves the issue for you?
Thanks in advance.
--
Bazsi
More information about the syslog-ng
mailing list