[syslog-ng] down network during reload leads to blocking syslog() calls

Mon Apr 17 22:31:27 UTC 2017

Hi there,
we are using syslog-ng 3.6.4 on Linux.  we had an incident where the
network port used for remote logging was inadvertently disabled for a
couple hours, and during this time a critical process which logged via
syslog() calls experienced threads hanging for seconds to minutes at a
time.

after some investigation and reproduction efforts, it looks like the
problem is a combination of:
- remote logging to server specified by hostname (vs. IP)
- loss of management interface (e.g. that used for syslog traffic and DNS
resolution)
- log rotate triggering syslog-ng reload

when I reproduce the problem (seems to take some considerable amount of
logging load and a minute or so), I can see that writes to /dev/log hang:
root at escort-ct0:/etc# strace -ttT logger test
^C
19:38:46.064830 execve("/usr/bin/logger", ["logger", "test"], [/* 26 vars
*/]) = 0 <0.000135>
...
19:38:46.069757 socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 1 <0.000015>
19:38:46.069806 connect(1, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110)
= 0 <0.000010>
19:38:46.069850 sendto(1, "<13>Mar 20 19:38:46 root: test", 30,
MSG_NOSIGNAL, NULL, 0) = 30 <11.627190>
19:38:57.697110 close(1)= 0 <0.000490>

meanwhile I see the main thread timing out name resolution:
19:48:18.214182 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644,
st_size=148, ...}) = 0 <0.000008>
19:48:18.214234 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 31
<0.000012>
19:48:18.214279 connect(31, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("10.15.83.11")}, 16) = 0 <0.000024>
19:48:18.214332 poll([{fd=31, events=POLLOUT}], 1, 0) = 1 ([{fd=31,
revents=POLLOUT}]) <0.000006>
19:48:18.214370 sendto(31,
"\343\235\1\0\0\1\0\0\0\0\0\0\rescdev-syslog\3dev\vpurestorage\3com\0\0\1\0\1",
51, MSG_NOSIGNAL, NULL, 0) = 51 <0.000047>
19:48:18.214444 poll([{fd=31, events=POLLIN}], 1, 5000) = 0 (Timeout)
<5.005040>

this repeats with the other DNS servers configured.

I cannot reproduce the issue if I configure an IP instead of hostname for
the syslog server.

we are using UDP and no flow-control configuration, with the expectation
that syslog() will never block.  and, indeed, until the reload, it works as
expected.  however, after reload I guess we re-establish the socket for the
remote connection, requiring us to resolve the hostname; I don’t pretend to
understand how this ultimately backs up processing of /dev/log (note that
internal() and kernel messages are coming through just fine during this
time).

I guess my question is whether this is a known/expected issue, and/or if
there’s a resolution other than specifying remote syslog servers by IP or
hardcoding the name resolution in /etc/hosts and pointing to that with
dns-cache-hosts().  basically, I’d like syslog-ng to simply give up if it
can’t resolve remote syslog server hostnames, rather than allow this to
interfere with servicing of /dev/log, with ramifications to callers.

thanks in advance,
nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20170417/a00a81b2/attachment.html>