On Sun, 2010-07-04 at 23:31 -0400, Tom Ritter wrote:
I'm getting a large volume of messages that look like this:
23:09:53 hn syslog-ng[30]: POLLERR occurred while idle; fd='13' 23:09:54 hn syslog-ng[30]: POLLERR occurred while idle; fd='14' 23:09:55 hn syslog-ng[30]: POLLERR occurred while idle; fd='13' 23:09:56 hn syslog-ng[30]: POLLERR occurred while idle; fd='14' 23:09:57 hn syslog-ng[30]: POLLERR occurred while idle; fd='13' 23:09:58 hn syslog-ng[30]: POLLERR occurred while idle; fd='14' 23:09:59 hn syslog-ng[30]: POLLERR occurred while idle; fd='13'
I have no idea what they are but they won't stop =(
I looked at the file descriptors:
# ls /proc/30/fd total 0 lr-x------ 1 root root 64 Jul 4 23:21 0 -> /dev/null l-wx------ 1 root root 64 Jul 4 23:21 1 -> /dev/null l-wx------ 1 root root 64 Jul 4 23:21 10 -> /var/log/messages/messages l-wx------ 1 root root 64 Jul 4 23:21 11 -> /var/log/hn/20100704.log l-wx------ 1 root root 64 Jul 4 23:21 13 -> pipe:[28238824] l-wx------ 1 root root 64 Jul 4 23:21 2 -> /dev/null lrwx------ 1 root root 64 Jul 4 23:21 3 -> socket:[28238645] l-wx------ 1 root root 64 Jul 4 23:21 4 -> pipe:[28238641] lrwx------ 1 root root 64 Jul 4 23:21 5 -> socket:[28238650] lr-x------ 1 root root 64 Jul 4 23:21 6 -> /proc/kmsg l-wx------ 1 root root 64 Jul 4 23:21 7 -> /var/log/syslog/syslog.log l-wx------ 1 root root 64 Jul 4 23:21 8 -> /var/log/fw/important.log l-wx------ 1 root root 64 Jul 4 23:21 9 -> /var/log/debug/debug.log
And I ran strace looking for fd's 13 and 14. Here are some grep results:
strace.log.30964:open("/dev/null", O_WRONLY) = 14 ---- strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout) strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13, revents=POLLOUT}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 29014) = 1 ([{fd=13, revents=POLLERR}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout) strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13, revents=POLLOUT}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 27100) = 1 ([{fd=13, revents=POLLERR}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout) strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13, revents=POLLOUT}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 25102) = 1 ([{fd=13, revents=POLLERR}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout) strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13, revents=POLLOUT}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 23291) = 1 ([{fd=13, revents=POLLERR}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout) strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13, revents=POLLOUT}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 21398) = 1 ([{fd=13, revents=POLLERR}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout) strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13, revents=POLLOUT}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 19680) = 1 ([{fd=13, revents=POLLERR}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = ? ERESTART_RESTARTBLOCK (To be restarted) strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13, revents=POLLOUT}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 18183) = 1 ([{fd=13, revents=POLLERR}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout) strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13, revents=POLLOUT}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 16697) = 1 ([{fd=13, revents=POLLERR}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 0 (Timeout) strace.log.30962:poll([{fd=13, events=POLLOUT}], 1, 0) = 1 ([{fd=13, revents=POLLOUT}]) strace.log.30962:poll([{fd=13, events=0}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 4, 15221) = 1 ([{fd=13, revents=POLLERR}]) ----- strace.log.30962:fcntl(13, F_GETFL) = 0x1 (flags O_WRONLY) strace.log.30962:fcntl(13, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 strace.log.30962:write(13, "POLLERR occurred while idle; fd="..., 168) = 168 strace.log.30962:close(13) = 0
I mussed some pid's around to try and format the email as best I could.
Can anyone suggest more debugging steps, or what might be causing this? I haven't seen this error elsewhere, the search results are barren, except the for commit logs.
This means that the receiver end of the pipe closed it, and syslog-ng had nothing to send to it while it did so. If there was a message in the queue, while this happened, you'd probably see this message instead: if (errno != EAGAIN && errno != EINTR) { msg_error("I/O error occurred while writing", evt_tag_int("fd", self->super.transport->fd), evt_tag_errno(EVT_TAG_OSERROR, errno), NULL); return LPS_ERROR; } So the root cause for this problem, is that the end of the pipe closes this connection. "/dev/null" doesn't do such things, so I doubt that the original open of /dev/null is the same case as your POLLERR error cases, however fd=13 seems to be a pipe. It is probably a program() destination (or a named pipe). -- Bazsi