[syslog-ng] syslog-ng suddenly stops logging
bazsi at balabit.hu
Fri Jun 27 10:13:52 CEST 2008
On Wed, 2008-06-25 at 08:49 -0400, Richard Vigeant wrote:
> On 24-Jun-08, at 3:37 AM, Balazs Scheidler wrote:
> > On Thu, 2008-06-19 at 15:54 -0400, Richard Vigeant wrote:
> >> Hi,
> >> I have a configuration where several nodes send all log messages to a
> >> central server. The
> >> applications on remote nodes send their logs locally either via UDP
> >> or
> >> a unix socket. The
> >> syslog-ng running on remote nodes simply pick up all log messages
> >> from
> >> all sources, i.e. TCP, UDP,
> >> /proc/kmsg, /dev/log and internal, and transmit all messages to the
> >> central server uisng TCP. The
> >> remote node's config file follows.
> >> We've been having intermittent problems where the central server
> >> would
> >> suddenly stop logging messages
> >> from certain nodes. We noticed that very often restarting syslog-ng
> >> on
> >> the central server would fix
> >> the condition and logging would carry on.
> >> Howver I discovered a new rare case where restarting the central
> >> syslog-ng didn't work. I found out
> >> by doing a tcpdump that the remote syslog-ng was not sending the log
> >> messages. I have done an strace
> >> on the remote syslog-ng and it shows that nothing happens after a
> >> message has been "recvfrom()" or
> >> "read()". Then I have restarted syslog-ng and things went back to
> >> normal. In the 2nd strace we can see
> >> that there is a "write()" after the "read()".
> > I might be guessing here as I don't really know which fd is which,
> > but I
> > think you've ran into an issue that some others have experienced
> > previously.
> > In the case when the traffic does not work, syslog-ng is correctly
> > polling fd 8 for output, I assumed that fd 8 is the fd of the
> > connection
> > to the server. (it is in the 2nd strace dump).
> > So syslog-ng is polling for writing out on fd 8, but the poll system
> > call does not indicate writability. This usually means that the tcp()
> > window is full, the server does not accept new data.
> > State based firewalls often drop inactive connections after a period
> > of
> > time and in case packets arrive for a connection for which no state
> > exists, packets are dropped.
> > Do you have a firewall between the client and the server?
> No firewall. Clients and server are all on the same LAN. This is one
> of our local QA environment.
> Note that I have seen similar cases where the problem occurred on the
> server and the output is a file. However I can't currently reproduce it.
Hmmm, and neither the clients nor the server is running connection
If my initial analysis is correct (an lsof output should confirm that),
then the problem is that syslog-ng is unable to send to the TCP
connection and it is the TCP stack of the OS that tells this to
If this is a QA network, can you run tcpdump to sniff the packets and
see how the on-wire traffic looks like?
More information about the syslog-ng