[syslog-ng] syslog-ng takes 100% CPU when network fails
Balazs Scheidler
bazsi at balabit.hu
Sun Oct 26 17:26:45 CET 2008
On Fri, 2008-10-24 at 05:50 +0000, D S, Manu (STSD) wrote:
> Hi,
>
> We are running syslog-ng 2.0.9 on a HP-UX 11.31 server. We have configured this system as a client to forward logs to a remote server. When there is a network failure ( simulated by ifconfig down ) syslog-ng starts to consume CPU and even after the network comes back, it does not forward any log messages and continues to hog CPU.
>
> We did system call tracing using tusc and found that "poll()" gets "POLLERR" event from TCP socket descriptor, but syslog-ng does not call any socket calls for the TCP, only calls "gettimeofday()".
>
> In the logs given, TCP connection to server is disconnected at 13:10:30. From that time, poll() receives POLLERR on the TCP socket (fd=6) and starts loop on gettimeofday(). Attached are the sar, netstat and tusc logs.
First of all, Thanks for the detailed error report.
As I see the problem seems to be caused by the fact that HP-UX returns
POLLERR only without the other bits (e.g. POLLHUP) syslog-ng would
handle this gracefully if either the other bits would be set, or there'd
be some pending messages to send, in which case a normal write() error
would occur.
This patch should fix the problem, although I only compile-tested it.
I'd appreciate if you could test this patch in your environment.
diff --git a/src/logwriter.c b/src/logwriter.c
index bb82b43..7a5fcf7 100644
--- a/src/logwriter.c
+++ b/src/logwriter.c
@@ -139,6 +139,13 @@ log_writer_fd_dispatch(GSource *source,
log_writer_broken(self->writer, NC_CLOSE);
return FALSE;
}
+ else if (self->pollfd.revents & (G_IO_ERR))
+ {
+ msg_error("POLLERR occurred while idle",
+ evt_tag_int("fd", self->fd->fd),
+ NULL);
+ log_writer_broken(self->writer, NC_WRITE_ERROR);
+ }
else if (self->writer->queue->length || self->writer->partial)
{
if (!log_writer_flush_log(self->writer, self->fd))
--
Bazsi
More information about the syslog-ng
mailing list