[syslog-ng] syslog-ng suddenly stops logging

Balazs Scheidler bazsi at balabit.hu
Fri Jun 27 10:13:52 CEST 2008


On Wed, 2008-06-25 at 08:49 -0400, Richard Vigeant wrote:
> On 24-Jun-08, at 3:37 AM, Balazs Scheidler wrote:
> 
> > On Thu, 2008-06-19 at 15:54 -0400, Richard Vigeant wrote:
> >> Hi,
> >>
> >>
> >> I have a configuration where several nodes send all log messages to a
> >> central server. The
> >> applications on remote nodes send their logs locally either via UDP  
> >> or
> >> a unix socket. The
> >> syslog-ng running on remote nodes simply pick up all log messages  
> >> from
> >> all sources, i.e. TCP, UDP,
> >> /proc/kmsg, /dev/log and internal, and transmit all messages to the
> >> central server uisng TCP. The
> >> remote node's config file follows.
> >>
> >>
> >> We've been having intermittent problems where the central server  
> >> would
> >> suddenly stop logging messages
> >> from certain nodes. We noticed that very often restarting syslog-ng  
> >> on
> >> the central server would fix
> >> the condition and logging would carry on.
> >>
> >>
> >> Howver I discovered a new rare case where restarting the central
> >> syslog-ng didn't work. I found out
> >> by doing a tcpdump that the remote syslog-ng was not sending the log
> >> messages. I have done an strace
> >> on the remote syslog-ng and it shows that nothing happens after a
> >> message has been "recvfrom()" or
> >> "read()". Then I have restarted syslog-ng and things went back to
> >> normal. In the 2nd strace we can see
> >> that there is a "write()" after the "read()".
> >>
> >
> > I might be guessing here as I don't really know which fd is which,  
> > but I
> > think you've ran into an issue that some others have experienced
> > previously.
> >
> > In the case when the traffic does not work, syslog-ng is correctly
> > polling fd 8 for output, I assumed that fd 8 is the fd of the  
> > connection
> > to the server. (it is in the 2nd strace dump).
> >
> > So syslog-ng is polling for writing out on fd 8, but the poll system
> > call does not indicate writability. This usually means that the tcp()
> > window is full, the server does not accept new data.
> >
> > State based firewalls often drop inactive connections after a period  
> > of
> > time and in case packets arrive for a connection for which no state
> > exists, packets are dropped.
> >
> > Do you have a firewall between the client and the server?
> 
> No firewall. Clients and server are all on the same LAN. This is one  
> of our local QA environment.
> 
> Note that I have seen similar cases where the problem occurred on the  
> server and the output is a file. However I can't currently reproduce it.
> >

Hmmm, and neither the clients nor the server is running connection
tracking, right?

If my initial analysis is correct (an lsof output should confirm that),
then the problem is that syslog-ng is unable to send to the TCP
connection and it is the TCP stack of the OS that tells this to
syslog-ng.

If this is a QA network, can you run tcpdump to sniff the packets and
see how the on-wire traffic looks like?

-- 
Bazsi



More information about the syslog-ng mailing list