Ah, there
might be other differences that you have then. All our machines are
almost identical, same OS, only 2 hardware models, same syslog
versions, etc, so there wasnt much variance to cause problems.
Definitely seems like an HP issue though, and common enough I would
have thought something they would have caught and corrected by now :-/
-Patrick
Sent: Wed Jan 26 2011 12:55:47 GMT-0500 (Eastern Standard Time)
From: Sowell, Brett <brett.sowell@amd.com>
To: Patrick H. <syslogng@feystorm.net> "Krizak, Paul"
<Paul.Krizak@amd.com>, Syslog-ng users' and developers' mailing
list <syslog-ng@lists.balabit.hu>, "Petrini, Bryce"
<Bryce.Petrini@amd.com>, "Hart, Corey" <Corey.Hart@amd.com>
Subject: Re: [syslog-ng] syslog-ng deadlock if /dev/console locks?
Hi Patrick,
I had come across your post during research and added the suggested
echo 'h' to our break/recovery test scripts.
It seemed that, in our case, echoing 'h' to sysrq-trigger would result
in a stable recovery of /dev/console 2/10 times, sometimes temporary
recovery, but more resulted in no effect.
-Brett
On 01/26/11 10:17, Patrick H. wrote:
Nope,
just
'echo h', that was it.
-Patrick
Sent: Wed Jan 26 2011 11:16:31 GMT-0500 (Eastern Standard Time)
From: Paul Krizak <paul.krizak@amd.com>
To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu>
"Patrick H." <syslogng@feystorm.net>,
"Sowell, Brett" <Brett.Sowell@amd.com>,
"Petrini, Bryce" <Bryce.Petrini@amd.com>,
"Hart, Corey" <Corey.Hart@amd.com>
Subject: Re: [syslog-ng] syslog-ng deadlock if /dev/console locks?
Fascinating.
So
just triggering the kernel to print something to the console (h is
"help") caused /dev/console to properly realign and syslog-ng woke back
up? You didn't have to restart syslog-ng or reboot the box or
anything?
Paul Krizak 7171 Southwest Pkwy MS B200.3A
MTS Systems Engineer Austin, TX 78735
Advanced Micro Devices Desk: (512) 602-8775
Linux/Unix Systems Engineering Cell: (512) 791-0686
Global IT Infrastructure Fax: (512) 602-0468
On 01/26/11 10:11, Patrick H. wrote:
We ran into this issue when upgrading iLO
on
all our boxes. When the iLO
was upgraded, /dev/console went completely unresponsive, and things
started to hang. The solution turned out to be 'echo h >
/proc/sysrq-trigger'. Apparently when the kernel went to write out to
the serial port, it ran into problems and would reinitialize it. After
that everything started working fine.
-Patrick
Sent: Wed Jan 26 2011 11:03:37 GMT-0500 (Eastern Standard Time)
From: Sandor Geller <Sandor.Geller@morganstanley.com>
To: Syslog-ng users' and developers' mailing list
<syslog-ng@lists.balabit.hu>
"Sowell, Brett" <Brett.Sowell@amd.com>,
"Petrini, Bryce" <Bryce.Petrini@amd.com>,
"Hart, Corey" <Corey.Hart@amd.com>
Subject: Re: [syslog-ng] syslog-ng deadlock if /dev/console locks?
Hello,
On Wed, Jan 26, 2011 at 4:12 PM, Paul
Krizak<paul.krizak@amd.com>
wrote:
Hi, we're using syslog-ng 3.1.2 and
have
run into what appears to be a
bug, but I'd like to get the community's opinion before we dig further
into it.
We have a bunch of HP servers with iLO2 and iLO3 devices, configured
with their virtual serial ports on COM1 (ttyS0). We subsequently have
the OS (RHEL4, RHEL5) configured to use COM1 as its console (e.g.
/dev/console). This is a very standard configuration that allows us to
get remote access to the machines without having to purchase the iLO
Advanced KVM feature. It also lets us use the Magic SysRq keys to
probe
dead systems and stuff, so in general it's not something we're keen to
change.
What we have found, however, is that there are some cases where the iLO
will freeze and requires a reboot. When the iLO reboots, however, the
kernel's connection to /dev/console (through the virtual serial port)
hangs and blocks. Any traffic to /dev/console just sits in the
kernel's
buffer and is never delivered. Once the buffer is full, the kernel
simply blocks on any write to /dev/console.
Now this is a Bad Thing in general, and we're working with HP to try
and
remedy this bug. However, what concerns me is that syslog-ng, when
faced with this behavior, also blocks, even for log messages not bound
for /dev/console.
syslog-ng uses a single thread (with the exception of database
destinations) running the event loop so when a read() or a write()
blocks then it affects the whole log processing
What we have observed is that a
system
with syslog-ng will keep
delivering the occasional console message to /dev/console (ex. *.emerg
messages) and meanwhile the file-based log paths keep working. But
once
/dev/console blocks, the next time a console message is delivered,
*all*
of syslog-ng blocks waiting for that message to be delivered, and all
of
the file-based paths block as well. The result is that pretty much
everything on the system stops working. For example, you can't log in,
even as root, because the login process blocks on the syslog command
that writes to /var/log/secure. Anything that uses syslog suddenly
blocks.
Is this expected behavior? I would think that syslog-ng would be able
to continue accepting and delivering messages, even if one of the log
paths is stalled on a blocked write.
syslog-ng uses non-blocking I/O for all sources / destinations but
despite of this the kernel could still block it therefore syslog-ng
protects reads/writes in logtransport.c with alarm() so it should
recover when timeout is set and a read/write blocked. For me it looks
like the timeout is not set in all cases, only file and program
sources initialise transport->timeout to 10 secs so I'd say this
isn't
expected behaviour - it is a bug.
Regards,
Sandor
______________________________________________________________________________
Member info:https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation:http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ:http://www.campin.net/syslog-ng/faq.html