[syslog-ng] syslog-ng deadlock if /dev/console locks?

Paul Krizak paul.krizak at amd.com
Wed Jan 26 17:16:31 CET 2011


Fascinating.  So just triggering the kernel to print something to the 
console (h is "help") caused /dev/console to properly realign and 
syslog-ng woke back up?  You didn't have to restart syslog-ng or reboot 
the box or anything?


Paul Krizak                         7171 Southwest Pkwy MS B200.3A
MTS Systems Engineer                Austin, TX  78735
Advanced Micro Devices              Desk:  (512) 602-8775
Linux/Unix Systems Engineering      Cell:  (512) 791-0686
Global IT Infrastructure            Fax:   (512) 602-0468

On 01/26/11 10:11, Patrick H. wrote:
> We ran into this issue when upgrading iLO on all our boxes. When the iLO
> was upgraded, /dev/console went completely unresponsive, and things
> started to hang. The solution turned out to be 'echo h >
> /proc/sysrq-trigger'. Apparently when the kernel went to write out to
> the serial port, it ran into problems and would reinitialize it. After
> that everything started working fine.
>
> -Patrick
>
> Sent: Wed Jan 26 2011 11:03:37 GMT-0500 (Eastern Standard Time)
> From: Sandor Geller <Sandor.Geller at morganstanley.com>
> To: Syslog-ng users' and developers' mailing list
> <syslog-ng at lists.balabit.hu> "Sowell, Brett" <Brett.Sowell at amd.com>,
> "Petrini, Bryce" <Bryce.Petrini at amd.com>, "Hart, Corey" <Corey.Hart at amd.com>
> Subject: Re: [syslog-ng] syslog-ng deadlock if /dev/console locks?
>> Hello,
>>
>> On Wed, Jan 26, 2011 at 4:12 PM, Paul Krizak<paul.krizak at amd.com>  wrote:
>>
>>> Hi, we're using syslog-ng 3.1.2 and have run into what appears to be a
>>> bug, but I'd like to get the community's opinion before we dig further
>>> into it.
>>>
>>> We have a bunch of HP servers with iLO2 and iLO3 devices, configured
>>> with their virtual serial ports on COM1 (ttyS0).  We subsequently have
>>> the OS (RHEL4, RHEL5) configured to use COM1 as its console (e.g.
>>> /dev/console).  This is a very standard configuration that allows us to
>>> get remote access to the machines without having to purchase the iLO
>>> Advanced KVM feature.  It also lets us use the Magic SysRq keys to probe
>>> dead systems and stuff, so in general it's not something we're keen to
>>> change.
>>>
>>> What we have found, however, is that there are some cases where the iLO
>>> will freeze and requires a reboot.  When the iLO reboots, however, the
>>> kernel's connection to /dev/console (through the virtual serial port)
>>> hangs and blocks.  Any traffic to /dev/console just sits in the kernel's
>>> buffer and is never delivered.  Once the buffer is full, the kernel
>>> simply blocks on any write to /dev/console.
>>>
>>> Now this is a Bad Thing in general, and we're working with HP to try and
>>> remedy this bug.  However, what concerns me is that syslog-ng, when
>>> faced with this behavior, also blocks, even for log messages not bound
>>> for /dev/console.
>>>
>>
>> syslog-ng uses a single thread (with the exception of database
>> destinations) running the event loop so when a read() or a write()
>> blocks then it affects the whole log processing
>>
>>
>>> What we have observed is that a system with syslog-ng will keep
>>> delivering the occasional console message to /dev/console (ex. *.emerg
>>> messages) and meanwhile the file-based log paths keep working.  But once
>>> /dev/console blocks, the next time a console message is delivered, *all*
>>> of syslog-ng blocks waiting for that message to be delivered, and all of
>>> the file-based paths block as well.  The result is that pretty much
>>> everything on the system stops working.  For example, you can't log in,
>>> even as root, because the login process blocks on the syslog command
>>> that writes to /var/log/secure.  Anything that uses syslog suddenly blocks.
>>>
>>> Is this expected behavior?  I would think that syslog-ng would be able
>>> to continue accepting and delivering messages, even if one of the log
>>> paths is stalled on a blocked write.
>>>
>>
>> syslog-ng uses non-blocking I/O for all sources / destinations but
>> despite of this the kernel could still block it therefore syslog-ng
>> protects reads/writes in logtransport.c with alarm() so it should
>> recover when timeout is set and a read/write blocked. For me it looks
>> like the timeout is not set in all cases, only file and program
>> sources initialise transport->timeout to 10 secs so I'd say this isn't
>> expected behaviour - it is a bug.
>>
>> Regards,
>>
>> Sandor
>> ______________________________________________________________________________
>> Member info:https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation:http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ:http://www.campin.net/syslog-ng/faq.html
>>
>>



More information about the syslog-ng mailing list