[syslog-ng] syslog-ng suddenly stops logging
Richard Vigeant
richard.vigeant at vantrix.com
Fri Jun 27 18:17:19 CEST 2008
On 27-Jun-08, at 4:13 AM, Balazs Scheidler wrote:
> On Wed, 2008-06-25 at 08:49 -0400, Richard Vigeant wrote:
>> On 24-Jun-08, at 3:37 AM, Balazs Scheidler wrote:
>>
>>> On Thu, 2008-06-19 at 15:54 -0400, Richard Vigeant wrote:
>>>> Hi,
>>>>
>>>>
>>>> I have a configuration where several nodes send all log messages
>>>> to a
>>>> central server. The
>>>> applications on remote nodes send their logs locally either via UDP
>>>> or
>>>> a unix socket. The
>>>> syslog-ng running on remote nodes simply pick up all log messages
>>>> from
>>>> all sources, i.e. TCP, UDP,
>>>> /proc/kmsg, /dev/log and internal, and transmit all messages to the
>>>> central server uisng TCP. The
>>>> remote node's config file follows.
>>>>
>>>>
>>>> We've been having intermittent problems where the central server
>>>> would
>>>> suddenly stop logging messages
>>>> from certain nodes. We noticed that very often restarting syslog-ng
>>>> on
>>>> the central server would fix
>>>> the condition and logging would carry on.
>>>>
>>>>
>>>> Howver I discovered a new rare case where restarting the central
>>>> syslog-ng didn't work. I found out
>>>> by doing a tcpdump that the remote syslog-ng was not sending the
>>>> log
>>>> messages. I have done an strace
>>>> on the remote syslog-ng and it shows that nothing happens after a
>>>> message has been "recvfrom()" or
>>>> "read()". Then I have restarted syslog-ng and things went back to
>>>> normal. In the 2nd strace we can see
>>>> that there is a "write()" after the "read()".
>>>>
>>>
>>> I might be guessing here as I don't really know which fd is which,
>>> but I
>>> think you've ran into an issue that some others have experienced
>>> previously.
>>>
>>> In the case when the traffic does not work, syslog-ng is correctly
>>> polling fd 8 for output, I assumed that fd 8 is the fd of the
>>> connection
>>> to the server. (it is in the 2nd strace dump).
>>>
>>> So syslog-ng is polling for writing out on fd 8, but the poll system
>>> call does not indicate writability. This usually means that the
>>> tcp()
>>> window is full, the server does not accept new data.
>>>
>>> State based firewalls often drop inactive connections after a period
>>> of
>>> time and in case packets arrive for a connection for which no state
>>> exists, packets are dropped.
>>>
>>> Do you have a firewall between the client and the server?
>>
>> No firewall. Clients and server are all on the same LAN. This is one
>> of our local QA environment.
>>
>> Note that I have seen similar cases where the problem occurred on the
>> server and the output is a file. However I can't currently
>> reproduce it.
>>>
>
> Hmmm, and neither the clients nor the server is running connection
> tracking, right?
>
> If my initial analysis is correct (an lsof output should confirm
> that),
> then the problem is that syslog-ng is unable to send to the TCP
> connection and it is the TCP stack of the OS that tells this to
> syslog-ng.
>
> If this is a QA network, can you run tcpdump to sniff the packets and
> see how the on-wire traffic looks like?
>
> --
> Bazsi
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
Well I had done netstat on both server and client and it showed the
TCP connection between the server:514 and client as ESTABLISHED.
I had run some tcpdump and all traffic seemed normal except for the
absence of syslog-ng traffic. Traffic was not particularly heavy and
everything else worked normally.
Unfortunately I cannot get any more info because since then I had to
re-enable syslog-ng on the QA system. All I did was restart syslog-ng
on the client node and went back to normal.
More information about the syslog-ng
mailing list