[syslog-ng] syslog-ng deadlock if /dev/console locks?

Patrick H. syslogng at feystorm.net
Wed Jan 26 19:13:19 CET 2011


Ah, there might be other differences that you have then. All our 
machines are almost identical, same OS, only 2 hardware models, same 
syslog versions, etc, so there wasnt much variance to cause problems. 
Definitely seems like an HP issue though, and common enough I would have 
thought something they would have caught and corrected by now :-/

-Patrick

Sent: Wed Jan 26 2011 12:55:47 GMT-0500 (Eastern Standard Time)
From: Sowell, Brett <brett.sowell at amd.com>
To: Patrick H. <syslogng at feystorm.net> "Krizak, Paul" 
<Paul.Krizak at amd.com>, Syslog-ng users' and developers' mailing list 
<syslog-ng at lists.balabit.hu>, "Petrini, Bryce" <Bryce.Petrini at amd.com>, 
"Hart, Corey" <Corey.Hart at amd.com>
Subject: Re: [syslog-ng] syslog-ng deadlock if /dev/console locks?
> Hi Patrick,
>
> I had come across your post during research and added the suggested 
> echo 'h' to our break/recovery test scripts.
>
> It seemed that, in our case, echoing 'h' to sysrq-trigger would result 
> in a stable recovery of /dev/console 2/10 times, sometimes temporary 
> recovery, but more resulted in no effect.
>
>  -Brett
>
> On 01/26/11 10:17, Patrick H. wrote:
>> Nope, just 'echo h', that was it.
>>
>> -Patrick
>>
>> Sent: Wed Jan 26 2011 11:16:31 GMT-0500 (Eastern Standard Time)
>> From: Paul Krizak <paul.krizak at amd.com>
>> To: Syslog-ng users' and developers' mailing list 
>> <syslog-ng at lists.balabit.hu> "Patrick H." <syslogng at feystorm.net>, 
>> "Sowell, Brett" <Brett.Sowell at amd.com>, "Petrini, Bryce" 
>> <Bryce.Petrini at amd.com>, "Hart, Corey" <Corey.Hart at amd.com>
>> Subject: Re: [syslog-ng] syslog-ng deadlock if /dev/console locks?
>>> Fascinating.  So just triggering the kernel to print something to 
>>> the console (h is "help") caused /dev/console to properly realign 
>>> and syslog-ng woke back up?  You didn't have to restart syslog-ng or 
>>> reboot the box or anything?
>>>
>>>
>>> Paul Krizak                         7171 Southwest Pkwy MS B200.3A
>>> MTS Systems Engineer                Austin, TX  78735
>>> Advanced Micro Devices              Desk:  (512) 602-8775
>>> Linux/Unix Systems Engineering      Cell:  (512) 791-0686
>>> Global IT Infrastructure            Fax:   (512) 602-0468
>>>
>>> On 01/26/11 10:11, Patrick H. wrote:
>>>> We ran into this issue when upgrading iLO on all our boxes. When 
>>>> the iLO
>>>> was upgraded, /dev/console went completely unresponsive, and things
>>>> started to hang. The solution turned out to be 'echo h >
>>>> /proc/sysrq-trigger'. Apparently when the kernel went to write out to
>>>> the serial port, it ran into problems and would reinitialize it. After
>>>> that everything started working fine.
>>>>
>>>> -Patrick
>>>>
>>>> Sent: Wed Jan 26 2011 11:03:37 GMT-0500 (Eastern Standard Time)
>>>> From: Sandor Geller <Sandor.Geller at morganstanley.com>
>>>> To: Syslog-ng users' and developers' mailing list
>>>> <syslog-ng at lists.balabit.hu> "Sowell, Brett" <Brett.Sowell at amd.com>,
>>>> "Petrini, Bryce" <Bryce.Petrini at amd.com>, "Hart, Corey" 
>>>> <Corey.Hart at amd.com>
>>>> Subject: Re: [syslog-ng] syslog-ng deadlock if /dev/console locks?
>>>>> Hello,
>>>>>
>>>>> On Wed, Jan 26, 2011 at 4:12 PM, Paul Krizak<paul.krizak at amd.com>  
>>>>> wrote:
>>>>>
>>>>>> Hi, we're using syslog-ng 3.1.2 and have run into what appears to 
>>>>>> be a
>>>>>> bug, but I'd like to get the community's opinion before we dig 
>>>>>> further
>>>>>> into it.
>>>>>>
>>>>>> We have a bunch of HP servers with iLO2 and iLO3 devices, configured
>>>>>> with their virtual serial ports on COM1 (ttyS0).  We subsequently 
>>>>>> have
>>>>>> the OS (RHEL4, RHEL5) configured to use COM1 as its console (e.g.
>>>>>> /dev/console).  This is a very standard configuration that allows 
>>>>>> us to
>>>>>> get remote access to the machines without having to purchase the iLO
>>>>>> Advanced KVM feature.  It also lets us use the Magic SysRq keys 
>>>>>> to probe
>>>>>> dead systems and stuff, so in general it's not something we're 
>>>>>> keen to
>>>>>> change.
>>>>>>
>>>>>> What we have found, however, is that there are some cases where 
>>>>>> the iLO
>>>>>> will freeze and requires a reboot.  When the iLO reboots, 
>>>>>> however, the
>>>>>> kernel's connection to /dev/console (through the virtual serial 
>>>>>> port)
>>>>>> hangs and blocks.  Any traffic to /dev/console just sits in the 
>>>>>> kernel's
>>>>>> buffer and is never delivered.  Once the buffer is full, the kernel
>>>>>> simply blocks on any write to /dev/console.
>>>>>>
>>>>>> Now this is a Bad Thing in general, and we're working with HP to 
>>>>>> try and
>>>>>> remedy this bug.  However, what concerns me is that syslog-ng, when
>>>>>> faced with this behavior, also blocks, even for log messages not 
>>>>>> bound
>>>>>> for /dev/console.
>>>>>>
>>>>>
>>>>> syslog-ng uses a single thread (with the exception of database
>>>>> destinations) running the event loop so when a read() or a write()
>>>>> blocks then it affects the whole log processing
>>>>>
>>>>>
>>>>>> What we have observed is that a system with syslog-ng will keep
>>>>>> delivering the occasional console message to /dev/console (ex. 
>>>>>> *.emerg
>>>>>> messages) and meanwhile the file-based log paths keep working.  
>>>>>> But once
>>>>>> /dev/console blocks, the next time a console message is 
>>>>>> delivered, *all*
>>>>>> of syslog-ng blocks waiting for that message to be delivered, and 
>>>>>> all of
>>>>>> the file-based paths block as well.  The result is that pretty much
>>>>>> everything on the system stops working.  For example, you can't 
>>>>>> log in,
>>>>>> even as root, because the login process blocks on the syslog command
>>>>>> that writes to /var/log/secure.  Anything that uses syslog 
>>>>>> suddenly blocks.
>>>>>>
>>>>>> Is this expected behavior?  I would think that syslog-ng would be 
>>>>>> able
>>>>>> to continue accepting and delivering messages, even if one of the 
>>>>>> log
>>>>>> paths is stalled on a blocked write.
>>>>>>
>>>>>
>>>>> syslog-ng uses non-blocking I/O for all sources / destinations but
>>>>> despite of this the kernel could still block it therefore syslog-ng
>>>>> protects reads/writes in logtransport.c with alarm() so it should
>>>>> recover when timeout is set and a read/write blocked. For me it looks
>>>>> like the timeout is not set in all cases, only file and program
>>>>> sources initialise transport->timeout to 10 secs so I'd say this 
>>>>> isn't
>>>>> expected behaviour - it is a bug.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Sandor
>>>>> ______________________________________________________________________________ 
>>>>>
>>>>> Member info:https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>> Documentation:http://www.balabit.com/support/documentation/?product=syslog-ng 
>>>>>
>>>>> FAQ:http://www.campin.net/syslog-ng/faq.html
>>>>>
>>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20110126/ee0c6bba/attachment-0001.htm 


More information about the syslog-ng mailing list