[syslog-ng] POLLERR occurred while idle

Balazs Scheidler bazsi at balabit.hu
Mon Jul 5 11:08:58 CEST 2010


On Sun, 2010-07-04 at 23:31 -0400, Tom Ritter wrote:
> I'm getting a large volume of messages that look like this:
> 
> 23:09:53 hn syslog-ng[30]: POLLERR occurred while idle; fd='13'
> 23:09:54 hn syslog-ng[30]: POLLERR occurred while idle; fd='14'
> 23:09:55 hn syslog-ng[30]: POLLERR occurred while idle; fd='13'
> 23:09:56 hn syslog-ng[30]: POLLERR occurred while idle; fd='14'
> 23:09:57 hn syslog-ng[30]: POLLERR occurred while idle; fd='13'
> 23:09:58 hn syslog-ng[30]: POLLERR occurred while idle; fd='14'
> 23:09:59 hn syslog-ng[30]: POLLERR occurred while idle; fd='13'
> 
> I have no idea what they are but they won't stop =(
> 
> I looked at the file descriptors:
> 
> # ls /proc/30/fd
> total 0
> lr-x------ 1 root root 64 Jul  4 23:21 0 -> /dev/null
> l-wx------ 1 root root 64 Jul  4 23:21 1 -> /dev/null
> l-wx------ 1 root root 64 Jul  4 23:21 10 -> /var/log/messages/messages
> l-wx------ 1 root root 64 Jul  4 23:21 11 -> /var/log/hn/20100704.log
> l-wx------ 1 root root 64 Jul  4 23:21 13 -> pipe:[28238824]
> l-wx------ 1 root root 64 Jul  4 23:21 2 -> /dev/null
> lrwx------ 1 root root 64 Jul  4 23:21 3 -> socket:[28238645]
> l-wx------ 1 root root 64 Jul  4 23:21 4 -> pipe:[28238641]
> lrwx------ 1 root root 64 Jul  4 23:21 5 -> socket:[28238650]
> lr-x------ 1 root root 64 Jul  4 23:21 6 -> /proc/kmsg
> l-wx------ 1 root root 64 Jul  4 23:21 7 -> /var/log/syslog/syslog.log
> l-wx------ 1 root root 64 Jul  4 23:21 8 -> /var/log/fw/important.log
> l-wx------ 1 root root 64 Jul  4 23:21 9 -> /var/log/debug/debug.log
> 
> And I ran strace looking for fd's 13 and 14.  Here are some grep results:
> 
> strace.log.30964:open("/dev/null", O_WRONLY) = 14
> ----
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
>     = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0)   = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 29014) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 27100) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0)   = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 25102) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
>     = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0)   = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 23291) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
>     = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0)   = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 21398) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
>     = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 19680) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
>     = ? ERESTART_RESTARTBLOCK (To be restarted)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0)   = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 18183) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0)
>     = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0)   = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 16697) = 1 ([{fd=13,
> revents=POLLERR}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout)
> strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0)   = 1 ([{fd=13,
> revents=POLLOUT}])
> strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=5, events=POLLIN}], 4, 15221) = 1 ([{fd=13,
> revents=POLLERR}])
> -----
> strace.log.30962:fcntl(13, F_GETFL)             = 0x1 (flags O_WRONLY)
> strace.log.30962:fcntl(13, F_SETFL, O_WRONLY|O_NONBLOCK)
>              = 0
> strace.log.30962:write(13, "POLLERR occurred while idle; fd="..., 168)
>                              = 168
> strace.log.30962:close(13) = 0
> 
> 
> 
> I mussed some pid's around to try and format the email as best I could.
> 
> Can anyone suggest more debugging steps, or what might be causing this?
>  I haven't seen this error elsewhere, the search results are barren,
> except the for commit logs.

This means that the receiver end of the pipe closed it, and syslog-ng
had nothing to send to it while it did so. If there was a message in the
queue, while this happened, you'd probably see this message instead:

  if (errno != EAGAIN && errno != EINTR)
    {
      msg_error("I/O error occurred while writing",
                evt_tag_int("fd", self->super.transport->fd),
                evt_tag_errno(EVT_TAG_OSERROR, errno),
                NULL);
      return LPS_ERROR;
    }

So the root cause for this problem, is that the end of the pipe closes
this connection. "/dev/null" doesn't do such things, so I doubt that the
original open of /dev/null is the same case as your POLLERR error cases,
however fd=13 seems to be a pipe. It is probably a program() destination
(or a named pipe).


-- 
Bazsi



More information about the syslog-ng mailing list