[syslog-ng] syslog-ng lock ups
erempel at uvic.ca
Tue Mar 7 19:00:30 UTC 2017
I have been having a problem with syslog-ng where is just stops processing all input. There are destinations that receive and count all of the messages that come out of syslog-ng and they stop getting any messages.
This is occurring on two syslog servers that have similar configurations. One is a superset of the simpler one. On these hosts we run two syslog-ng instances. One for the regular OS log messages from /dev/lo, /proc/kmsg, localhost:1514 and the second one which only listens on the network port(s) which we call our syslog server. The syslog server has its internal log messages going to localhost:1514 so any internal events such as new connections etc should be logged to the "OS syslog" instance.
We had this problem on 3.7.x and after upgrading to 3.9.1 we still encounter this problem.
The OS is now the latest Redhat 6 (6.8), but the same problem occurred on any Redhat 6.x system.
Our host monitoring shows that the syslog-ng process did NOT increase its memory footprint which indicates that it stopped reading its source.
All of the hosts that send logs to syslog server showed increased message queueing which confirms that the source stopped being read.
The CPU consumption by the syslog-ng process is zero during this time period.
Our network monitoring infrastructure opens a connection to the syslog server every 30 seconds. This gets logged via the localhost:1514 connection. These accepted and closed log lines are missing during our problem window even though the monitoring tool WAS able to connect through the entire problem window.
We have configured the syslog stats log line to be sent to an external program which invokes syslog-ng-ctl stats and it processes this data. This process did not get any such stats line, so the exact syslog-ng stats counters are unavailable during this problem window. The syslog-ng-ctl program was NOT hung on the socket.
Attempting a graceful shutdown of the syslog-ng process either with syslog-ng-ctl or with a kill -TERM appears to have no affect. I assume this is because syslog-ng is unable to flush its buffers so it does not terminate.
Killing syslog-ng and restarting it starts processing again correctly.
Has anyone else had this symptom?
Has anyone found a solution to this?
I realize that this is all anecdotal but was hoping it would trigger someone's memory.
Thanks in advance.
More information about the syslog-ng