[syslog-ng] syslog-ng takes 100% CPU when network fails

D S, Manu (STSD) manu.d-s at hp.com
Wed Oct 29 10:45:07 CET 2008


This patch fixes the issue. Thanks!

- Manu

> > Hi,
> >
> >         We are running syslog-ng 2.0.9 on a HP-UX 11.31 server. We
> have configured this system as a client to forward logs to a remote
> server. When there is a network failure ( simulated by ifconfig down )
> syslog-ng starts to consume CPU and even after the network comes back,
> it does not forward any log messages and continues to hog CPU.
> >
> >         We did system call tracing using tusc and found that "poll()"
> gets "POLLERR" event from TCP socket descriptor, but syslog-ng does not
> call  any socket calls for the TCP, only calls "gettimeofday()".
> >
> >         In the logs given, TCP connection to server is disconnected
> at 13:10:30. From that time, poll() receives POLLERR on the TCP socket
> (fd=6)  and starts loop on gettimeofday(). Attached are the sar,
> netstat and tusc logs.
>
> First of all, Thanks for the detailed error report.
>
> As I see the problem seems to be caused by the fact that HP-UX returns
> POLLERR only without the other bits (e.g. POLLHUP) syslog-ng would
> handle this gracefully if either the other bits would be set, or
> there'd
> be some pending messages to send, in which case a normal write() error
> would occur.
>
> This patch should fix the problem, although I only compile-tested it.
> I'd appreciate if you could test this patch in your environment.
>
> diff --git a/src/logwriter.c b/src/logwriter.c
> index bb82b43..7a5fcf7 100644
> --- a/src/logwriter.c
> +++ b/src/logwriter.c
> @@ -139,6 +139,13 @@ log_writer_fd_dispatch(GSource *source,
>        log_writer_broken(self->writer, NC_CLOSE);
>        return FALSE;
>      }
> +  else if (self->pollfd.revents & (G_IO_ERR))
> +    {
> +      msg_error("POLLERR occurred while idle",
> +                evt_tag_int("fd", self->fd->fd),
> +                NULL);
> +      log_writer_broken(self->writer, NC_WRITE_ERROR);
> +    }
>    else if (self->writer->queue->length || self->writer->partial)
>      {
>        if (!log_writer_flush_log(self->writer, self->fd))
>
>
>
>
>
> --
> Bazsi
>


More information about the syslog-ng mailing list