[syslog-ng] supervisor not restarting a failed daemon process

Evan Rempel erempel at uvic.ca
Mon Apr 29 18:46:41 CEST 2013


The logs below show a "standard" syslog-ng processID=15017 that reads /dev/log /proc/kmsg
The second instance of syslog-ng is what we call our "server" which just listens on the network ports
and does all of the complex patterndb, filtering and routing to destination processes

kern.info kernel: syslog-ng[1561]: segfault at 7f65c0000078 ip 00007f65c0000078 sp 00007f65e1385a48 error 15
--- this is the server instance segfaulting (I assume, see WAIT below)

syslog.notice syslog-ng[15017]: Syslog connection closed; fd='20', client='AF_INET(142.104.47.145:49803)', local='AF_INET(127.0.0.1:1514)'
syslog.notice syslog-ng[15017]: Syslog connection broken; fd='14', server='AF_INET(142.104.47.146:514)', time_reopen='5'
--- this was the standard syslog loosing connection to the server, and detecting the drop of the server instance destination to it.

daemon.info syslog-ng-stats: server stopping on socket "/var/local/syslog-ng.server.ctl"
daemon.info msgid_profiler[832]: committing residual data
local0.info flare-timer[834]: stopping
local0.info action-handler[833]: stopping
--- these are all program destinations of the server instance shutting down gracefully after the close of their stdin.

daemon.crit supervise/syslog-ng[27221]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11'
syslog.err syslog-ng[15017]: Syslog connection failed; fd='14', server='AF_INET(142.104.47.146:514)', error='Connection refused (111)', time_reopen='5'

This "connection failed" message repeats every 5 seconds until I restart the server instance.

syslog.notice syslog-ng[1911]: syslog-ng starting up; version='3.4.1'

So it does not look like there is anything in the logs about attempted restarts.


WAIT...

This is really odd. The line

kern.info kernel: syslog-ng[1561]: segfault at 7f65c0000078 ip 00007f65c0000078 sp 00007f65e1385a48 error 15

implies that there was a process ID 1561 that segfaulted, but that line is the ONLY logged line with that process ID.
We take ps snapshots every 15 minutes, and those snapshots don't show anything for that process ID.
Also, the supervisor processID is shown

  USER       PID  PPID  NI PRI CPU    VSZ     ELAPSED     TIME COMMAND
root 27221 1 0 19 - 26556 9-04:00:59 00:00:00 supervising syslog-ng
root 27222 27221 0 19 - 977064 9-04:00:59 1-02:52:35 /usr/local/sbin/syslog-ng --cfgfile= ...

which matches the line
daemon.crit supervise/syslog-ng[27221]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11'

so its child which dies should have been processID 27222 so why is the log line
kern.info kernel: syslog-ng[1561]: segfault at 7f65c0000078 ip 00007f65c0000078 sp 00007f65e1385a48 error 15


I conclude that the 1561 is not the process ID.

Can you shed any light on this?

Evan.

On 04/26/2013 11:00 PM, Balazs Scheidler wrote:
 >
> Strange, indeed. The supervisor gives up if the restarted daemon exits for some reason. Eg. If there's an initialization error it gives up. Any indication in the logs?
>
> Evan Rempel <erempel at uvic.ca> wrote:
>
>> We are sing the log line
>>
>> supervise/syslog-ng[27221]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11'
>>
>>
>> and it looks like it should restart, but instead of restarting,
>> the supervisor terminates and then no syslog-ng process is running.
>>
>> Is this a bug in the supervisor?
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>


-- 
Evan Rempel                                      erempel at uvic.ca
Senior Systems Administrator                        250.721.7691
Data Centre Services, University Systems, University of Victoria


More information about the syslog-ng mailing list