[syslog-ng] syslog-ng deadlock if /dev/console locks?
Patrick H.
syslogng at feystorm.net
Wed Jan 26 19:13:19 CET 2011
Ah, there might be other differences that you have then. All our
machines are almost identical, same OS, only 2 hardware models, same
syslog versions, etc, so there wasnt much variance to cause problems.
Definitely seems like an HP issue though, and common enough I would have
thought something they would have caught and corrected by now :-/
-Patrick
Sent: Wed Jan 26 2011 12:55:47 GMT-0500 (Eastern Standard Time)
From: Sowell, Brett <brett.sowell at amd.com>
To: Patrick H. <syslogng at feystorm.net> "Krizak, Paul"
<Paul.Krizak at amd.com>, Syslog-ng users' and developers' mailing list
<syslog-ng at lists.balabit.hu>, "Petrini, Bryce" <Bryce.Petrini at amd.com>,
"Hart, Corey" <Corey.Hart at amd.com>
Subject: Re: [syslog-ng] syslog-ng deadlock if /dev/console locks?
> Hi Patrick,
>
> I had come across your post during research and added the suggested
> echo 'h' to our break/recovery test scripts.
>
> It seemed that, in our case, echoing 'h' to sysrq-trigger would result
> in a stable recovery of /dev/console 2/10 times, sometimes temporary
> recovery, but more resulted in no effect.
>
> -Brett
>
> On 01/26/11 10:17, Patrick H. wrote:
>> Nope, just 'echo h', that was it.
>>
>> -Patrick
>>
>> Sent: Wed Jan 26 2011 11:16:31 GMT-0500 (Eastern Standard Time)
>> From: Paul Krizak <paul.krizak at amd.com>
>> To: Syslog-ng users' and developers' mailing list
>> <syslog-ng at lists.balabit.hu> "Patrick H." <syslogng at feystorm.net>,
>> "Sowell, Brett" <Brett.Sowell at amd.com>, "Petrini, Bryce"
>> <Bryce.Petrini at amd.com>, "Hart, Corey" <Corey.Hart at amd.com>
>> Subject: Re: [syslog-ng] syslog-ng deadlock if /dev/console locks?
>>> Fascinating. So just triggering the kernel to print something to
>>> the console (h is "help") caused /dev/console to properly realign
>>> and syslog-ng woke back up? You didn't have to restart syslog-ng or
>>> reboot the box or anything?
>>>
>>>
>>> Paul Krizak 7171 Southwest Pkwy MS B200.3A
>>> MTS Systems Engineer Austin, TX 78735
>>> Advanced Micro Devices Desk: (512) 602-8775
>>> Linux/Unix Systems Engineering Cell: (512) 791-0686
>>> Global IT Infrastructure Fax: (512) 602-0468
>>>
>>> On 01/26/11 10:11, Patrick H. wrote:
>>>> We ran into this issue when upgrading iLO on all our boxes. When
>>>> the iLO
>>>> was upgraded, /dev/console went completely unresponsive, and things
>>>> started to hang. The solution turned out to be 'echo h >
>>>> /proc/sysrq-trigger'. Apparently when the kernel went to write out to
>>>> the serial port, it ran into problems and would reinitialize it. After
>>>> that everything started working fine.
>>>>
>>>> -Patrick
>>>>
>>>> Sent: Wed Jan 26 2011 11:03:37 GMT-0500 (Eastern Standard Time)
>>>> From: Sandor Geller <Sandor.Geller at morganstanley.com>
>>>> To: Syslog-ng users' and developers' mailing list
>>>> <syslog-ng at lists.balabit.hu> "Sowell, Brett" <Brett.Sowell at amd.com>,
>>>> "Petrini, Bryce" <Bryce.Petrini at amd.com>, "Hart, Corey"
>>>> <Corey.Hart at amd.com>
>>>> Subject: Re: [syslog-ng] syslog-ng deadlock if /dev/console locks?
>>>>> Hello,
>>>>>
>>>>> On Wed, Jan 26, 2011 at 4:12 PM, Paul Krizak<paul.krizak at amd.com>
>>>>> wrote:
>>>>>
>>>>>> Hi, we're using syslog-ng 3.1.2 and have run into what appears to
>>>>>> be a
>>>>>> bug, but I'd like to get the community's opinion before we dig
>>>>>> further
>>>>>> into it.
>>>>>>
>>>>>> We have a bunch of HP servers with iLO2 and iLO3 devices, configured
>>>>>> with their virtual serial ports on COM1 (ttyS0). We subsequently
>>>>>> have
>>>>>> the OS (RHEL4, RHEL5) configured to use COM1 as its console (e.g.
>>>>>> /dev/console). This is a very standard configuration that allows
>>>>>> us to
>>>>>> get remote access to the machines without having to purchase the iLO
>>>>>> Advanced KVM feature. It also lets us use the Magic SysRq keys
>>>>>> to probe
>>>>>> dead systems and stuff, so in general it's not something we're
>>>>>> keen to
>>>>>> change.
>>>>>>
>>>>>> What we have found, however, is that there are some cases where
>>>>>> the iLO
>>>>>> will freeze and requires a reboot. When the iLO reboots,
>>>>>> however, the
>>>>>> kernel's connection to /dev/console (through the virtual serial
>>>>>> port)
>>>>>> hangs and blocks. Any traffic to /dev/console just sits in the
>>>>>> kernel's
>>>>>> buffer and is never delivered. Once the buffer is full, the kernel
>>>>>> simply blocks on any write to /dev/console.
>>>>>>
>>>>>> Now this is a Bad Thing in general, and we're working with HP to
>>>>>> try and
>>>>>> remedy this bug. However, what concerns me is that syslog-ng, when
>>>>>> faced with this behavior, also blocks, even for log messages not
>>>>>> bound
>>>>>> for /dev/console.
>>>>>>
>>>>>
>>>>> syslog-ng uses a single thread (with the exception of database
>>>>> destinations) running the event loop so when a read() or a write()
>>>>> blocks then it affects the whole log processing
>>>>>
>>>>>
>>>>>> What we have observed is that a system with syslog-ng will keep
>>>>>> delivering the occasional console message to /dev/console (ex.
>>>>>> *.emerg
>>>>>> messages) and meanwhile the file-based log paths keep working.
>>>>>> But once
>>>>>> /dev/console blocks, the next time a console message is
>>>>>> delivered, *all*
>>>>>> of syslog-ng blocks waiting for that message to be delivered, and
>>>>>> all of
>>>>>> the file-based paths block as well. The result is that pretty much
>>>>>> everything on the system stops working. For example, you can't
>>>>>> log in,
>>>>>> even as root, because the login process blocks on the syslog command
>>>>>> that writes to /var/log/secure. Anything that uses syslog
>>>>>> suddenly blocks.
>>>>>>
>>>>>> Is this expected behavior? I would think that syslog-ng would be
>>>>>> able
>>>>>> to continue accepting and delivering messages, even if one of the
>>>>>> log
>>>>>> paths is stalled on a blocked write.
>>>>>>
>>>>>
>>>>> syslog-ng uses non-blocking I/O for all sources / destinations but
>>>>> despite of this the kernel could still block it therefore syslog-ng
>>>>> protects reads/writes in logtransport.c with alarm() so it should
>>>>> recover when timeout is set and a read/write blocked. For me it looks
>>>>> like the timeout is not set in all cases, only file and program
>>>>> sources initialise transport->timeout to 10 secs so I'd say this
>>>>> isn't
>>>>> expected behaviour - it is a bug.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Sandor
>>>>> ______________________________________________________________________________
>>>>>
>>>>> Member info:https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>> Documentation:http://www.balabit.com/support/documentation/?product=syslog-ng
>>>>>
>>>>> FAQ:http://www.campin.net/syslog-ng/faq.html
>>>>>
>>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20110126/ee0c6bba/attachment-0001.htm
More information about the syslog-ng
mailing list