[syslog-ng] POLLERR occurred while idle
Balazs Scheidler
bazsi at balabit.hu
Mon Jul 5 11:08:58 CEST 2010
On Sun, 2010-07-04 at 23:31 -0400, Tom Ritter wrote:
> I'm getting a large volume of messages that look like this:
>
> 23:09:53 hn syslog-ng[30]: POLLERR occurred while idle; fd='13'
> 23:09:54 hn syslog-ng[30]: POLLERR occurred while idle; fd='14'
> 23:09:55 hn syslog-ng[30]: POLLERR occurred while idle; fd='13'
> 23:09:56 hn syslog-ng[30]: POLLERR occurred while idle; fd='14'
> 23:09:57 hn syslog-ng[30]: POLLERR occurred while idle; fd='13'
> 23:09:58 hn syslog-ng[30]: POLLERR occurred while idle; fd='14'
> 23:09:59 hn syslog-ng[30]: POLLERR occurred while idle; fd='13'
>
> I have no idea what they are but they won't stop =(
>
> I looked at the file descriptors:
>
> # ls /proc/30/fd
> total 0
> lr-x------ 1 root root 64 Jul 4 23:21 0 -> /dev/null
> l-wx------ 1 root root 64 Jul 4 23:21 1 -> /dev/null
> l-wx------ 1 root root 64 Jul 4 23:21 10 -> /var/log/messages/messages
> l-wx------ 1 root root 64 Jul 4 23:21 11 -> /var/log/hn/20100704.log
> l-wx------ 1 root root 64 Jul 4 23:21 13 -> pipe:[28238824]
> l-wx------ 1 root root 64 Jul 4 23:21 2 -> /dev/null
> lrwx------ 1 root root 64 Jul 4 23:21 3 -> socket:[28238645]
> l-wx------ 1 root root 64 Jul 4 23:21 4 -> pipe:[28238641]
> lrwx------ 1 root root 64 Jul 4 23:21 5 -> socket:[28238650]
> lr-x------ 1 root root 64 Jul 4 23:21 6 -> /proc/kmsg
> l-wx------ 1 root root 64 Jul 4 23:21 7 -> /var/log/syslog/syslog.log
> l-wx------ 1 root root 64 Jul 4 23:21 8 -> /var/log/fw/important.log
> l-wx------ 1 root root 64 Jul 4 23:21 9 -> /var/log/debug/debug.log
>
> And I ran strace looking for fd's 13 and 14. Here are some grep results:
>
> strace.log.30964:open("/dev/null", O_WRONLY) = 14
> ----
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
> = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 29014) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 27100) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 25102) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
> = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 23291) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
> = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 21398) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
> = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 19680) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
> = ? ERESTART_RESTARTBLOCK (To be restarted)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 18183) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
> = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 16697) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 15221) = 1 ([{fd=13,
> revents=POLLERR}])
> -----
> strace.log.30962:fcntl(13, F_GETFL) = 0x1 (flags O_WRONLY)
> strace.log.30962:fcntl(13, F_SETFL, O_WRONLY|O_NONBLOCK)
> = 0
> strace.log.30962:write(13, "POLLERR occurred while idle; fd="..., 168)
> = 168
> strace.log.30962:close(13) = 0
>
>
>
> I mussed some pid's around to try and format the email as best I could.
>
> Can anyone suggest more debugging steps, or what might be causing this?
> I haven't seen this error elsewhere, the search results are barren,
> except the for commit logs.
This means that the receiver end of the pipe closed it, and syslog-ng
had nothing to send to it while it did so. If there was a message in the
queue, while this happened, you'd probably see this message instead:
if (errno != EAGAIN && errno != EINTR)
{
msg_error("I/O error occurred while writing",
evt_tag_int("fd", self->super.transport->fd),
evt_tag_errno(EVT_TAG_OSERROR, errno),
NULL);
return LPS_ERROR;
}
So the root cause for this problem, is that the end of the pipe closes
this connection. "/dev/null" doesn't do such things, so I doubt that the
original open of /dev/null is the same case as your POLLERR error cases,
however fd=13 seems to be a pipe. It is probably a program() destination
(or a named pipe).
--
Bazsi
More information about the syslog-ng
mailing list