[syslog-ng] syslog-ng suddenly stops logging

Balazs Scheidler bazsi at balabit.hu
Mon Jun 30 13:33:51 CEST 2008


On Fri, 2008-06-27 at 12:17 -0400, Richard Vigeant wrote:
> On 27-Jun-08, at 4:13 AM, Balazs Scheidler wrote:
> 
> > On Wed, 2008-06-25 at 08:49 -0400, Richard Vigeant wrote:
> >> On 24-Jun-08, at 3:37 AM, Balazs Scheidler wrote:
> >>
> >>> On Thu, 2008-06-19 at 15:54 -0400, Richard Vigeant wrote:
> >>>> Hi,
> >>>>
> >>>>
> >>>> I have a configuration where several nodes send all log messages  
> >>>> to a
> >>>> central server. The
> >>>> applications on remote nodes send their logs locally either via UDP
> >>>> or
> >>>> a unix socket. The
> >>>> syslog-ng running on remote nodes simply pick up all log messages
> >>>> from
> >>>> all sources, i.e. TCP, UDP,
> >>>> /proc/kmsg, /dev/log and internal, and transmit all messages to the
> >>>> central server uisng TCP. The
> >>>> remote node's config file follows.
> >>>>
> >>>>
> >>>> We've been having intermittent problems where the central server
> >>>> would
> >>>> suddenly stop logging messages
> >>>> from certain nodes. We noticed that very often restarting syslog-ng
> >>>> on
> >>>> the central server would fix
> >>>> the condition and logging would carry on.
> >>>>
> >>>>
> >>>> Howver I discovered a new rare case where restarting the central
> >>>> syslog-ng didn't work. I found out
> >>>> by doing a tcpdump that the remote syslog-ng was not sending the  
> >>>> log
> >>>> messages. I have done an strace
> >>>> on the remote syslog-ng and it shows that nothing happens after a
> >>>> message has been "recvfrom()" or
> >>>> "read()". Then I have restarted syslog-ng and things went back to
> >>>> normal. In the 2nd strace we can see
> >>>> that there is a "write()" after the "read()".
> >>>>
> >>>
> >>> I might be guessing here as I don't really know which fd is which,
> >>> but I
> >>> think you've ran into an issue that some others have experienced
> >>> previously.
> >>>
> >>> In the case when the traffic does not work, syslog-ng is correctly
> >>> polling fd 8 for output, I assumed that fd 8 is the fd of the
> >>> connection
> >>> to the server. (it is in the 2nd strace dump).
> >>>
> >>> So syslog-ng is polling for writing out on fd 8, but the poll system
> >>> call does not indicate writability. This usually means that the  
> >>> tcp()
> >>> window is full, the server does not accept new data.
> >>>
> >>> State based firewalls often drop inactive connections after a period
> >>> of
> >>> time and in case packets arrive for a connection for which no state
> >>> exists, packets are dropped.
> >>>
> >>> Do you have a firewall between the client and the server?
> >>
> >> No firewall. Clients and server are all on the same LAN. This is one
> >> of our local QA environment.
> >>
> >> Note that I have seen similar cases where the problem occurred on the
> >> server and the output is a file. However I can't currently  
> >> reproduce it.
> >>>
> >
> > Hmmm, and neither the clients nor the server is running connection
> > tracking, right?
> >
> > If my initial analysis is correct (an lsof output should confirm  
> > that),
> > then the problem is that syslog-ng is unable to send to the TCP
> > connection and it is the TCP stack of the OS that tells this to
> > syslog-ng.
> >
> > If this is a QA network, can you run tcpdump to sniff the packets and
> > see how the on-wire traffic looks like?
> >
> Well I had done netstat on both server and client and it showed the  
> TCP connection between the server:514 and client  as ESTABLISHED.
> 
> I had run some tcpdump and all traffic seemed normal except for the  
> absence of syslog-ng traffic. Traffic was not particularly heavy and  
> everything else worked normally.
> 
> Unfortunately I cannot get any more info because since then I had to  
> re-enable syslog-ng on the QA system. All I did was restart syslog-ng  
> on the client node and went back to normal.
> 


I meant to do a tcpdump on a complete syslog connection, from start to
hang. E.g. if it works now, please enable tcpdump on the client node and
dump everything to a file. If it hangs again, the tcpdump might give
some clues.

I still think that it might be a network related problem, as the strace
showed that syslog-ng was willing to write to the TCP socket, but the
system did not indicate that it would be possible to write, e.g. the
window was full.

-- 
Bazsi



More information about the syslog-ng mailing list