[syslog-ng] [Bug 190] syslog-ng with TCP source, fails to shutdown properly, and generates core dump

bugzilla at bugzilla.balabit.com bugzilla at bugzilla.balabit.com
Wed Aug 29 20:29:01 CEST 2012


https://bugzilla.balabit.com/show_bug.cgi?id=190





--- Comment #14 from Lennert Buytenhek <buytenh at wantstofly.org>  2012-08-29 20:29:01 ---
Marvin, thanks a lot for the truss output, this is quite helpful.

The ivykis stable-v0.30 branch has a patch "port: Properly handle
ETIME returns from port_getn()." on it which makes ivykis deal with
the fact that port_getn() on Solaris, contrary to the available
documentation, can simultaneously return events and claim that a
timeout occured.

The symptom that this fixes is lost events, but since that wasn't
mentioned in this bug report, I somehow implicitly assumed that
the issue you were seeing couldn't be the same issue.

However, looking at your truss output, this issue does actually
trigger in your situation (even if you may not be aware of it) --
this is port_getn() both returning an event (1) and reporting a
timeout ([62], 62 is ETIME):

        8740/1:         port_getn(3, 0x08043B3C, 1024, 1, 0x08047B7C)   = 1 [62]

When port_getn() returns an event for a file descriptor, that
file descriptor is unregistered ('dissociated' in port parlance)
from the port.  So, missing the return event means not only losing
notification that the file descriptor is active, but also, losing
notification that the file descriptor is now no longer associated
with the port.

This bites us further down the line, when the file descriptor is
unregistered in the end (by calling iv_fd_unregister()).  ivykis has
not received an event for the file descriptor, and so it thinks that
the file descriptor is still associated with the port, and that it
must call port_dissociate() on it to dissociate it from the port.
However, the kernel _has_ delivered an event, and has already
dissociated the file descriptor from the port, and thus will return
-ENOENT when we ask it to dissociate the file descriptor again.

So, even though it did not seem that way at first, the issue you
were seeing is actually the exact same issue that the port_getn()
ETIME patch solves.

I'm sorry for wasting your time on this, I just got confused by the
apparent absence of the primary symptom that the port_getn() ETIME
patch was meant to solve.


-- 
Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.


More information about the syslog-ng mailing list