[syslog-ng] supervisor not restarting a failed daemon process

Balazs Scheidler bazsi77 at gmail.com
Mon Apr 29 20:16:25 CEST 2013


Hi,

No kernel source handy but my guess is that the kernel is logging the tid value, which is the same as pid as long as the process is single threaded. 

I've checked the supervisor code and the only way this could happen is a fork/pipe error which is not logged. It should be logged but who knows that message can be lost. The supervisor attempts to restart 3 times then gives up.

Hmmm the supervisor messages may be redirected to syslog after the first startup which might explain why they don't get logged.

But why does fork/pipe fail?

Hope this helps.

Evan Rempel <erempel at uvic.ca> wrote:

>The logs below show a "standard" syslog-ng processID=15017 that reads /dev/log /proc/kmsg
>The second instance of syslog-ng is what we call our "server" which just listens on the network ports
>and does all of the complex patterndb, filtering and routing to destination processes
>
>kern.info kernel: syslog-ng[1561]: segfault at 7f65c0000078 ip 00007f65c0000078 sp 00007f65e1385a48 error 15
>--- this is the server instance segfaulting (I assume, see WAIT below)
>
>syslog.notice syslog-ng[15017]: Syslog connection closed; fd='20', client='AF_INET(142.104.47.145:49803)', local='AF_INET(127.0.0.1:1514)'
>syslog.notice syslog-ng[15017]: Syslog connection broken; fd='14', server='AF_INET(142.104.47.146:514)', time_reopen='5'
>--- this was the standard syslog loosing connection to the server, and detecting the drop of the server instance destination to it.
>
>daemon.info syslog-ng-stats: server stopping on socket "/var/local/syslog-ng.server.ctl"
>daemon.info msgid_profiler[832]: committing residual data
>local0.info flare-timer[834]: stopping
>local0.info action-handler[833]: stopping
>--- these are all program destinations of the server instance shutting down gracefully after the close of their stdin.
>
>daemon.crit supervise/syslog-ng[27221]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11'
>syslog.err syslog-ng[15017]: Syslog connection failed; fd='14', server='AF_INET(142.104.47.146:514)', error='Connection refused (111)', time_reopen='5'
>
>This "connection failed" message repeats every 5 seconds until I restart the server instance.
>
>syslog.notice syslog-ng[1911]: syslog-ng starting up; version='3.4.1'
>
>So it does not look like there is anything in the logs about attempted restarts.
>
>
>WAIT...
>
>This is really odd. The line
>
>kern.info kernel: syslog-ng[1561]: segfault at 7f65c0000078 ip 00007f65c0000078 sp 00007f65e1385a48 error 15
>
>implies that there was a process ID 1561 that segfaulted, but that line is the ONLY logged line with that process ID.
>We take ps snapshots every 15 minutes, and those snapshots don't show anything for that process ID.
>Also, the supervisor processID is shown
>
>  USER       PID  PPID  NI PRI CPU    VSZ     ELAPSED     TIME COMMAND
>root 27221 1 0 19 - 26556 9-04:00:59 00:00:00 supervising syslog-ng
>root 27222 27221 0 19 - 977064 9-04:00:59 1-02:52:35 /usr/local/sbin/syslog-ng --cfgfile= ...
>
>which matches the line
>daemon.crit supervise/syslog-ng[27221]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11'
>
>so its child which dies should have been processID 27222 so why is the log line
>kern.info kernel: syslog-ng[1561]: segfault at 7f65c0000078 ip 00007f65c0000078 sp 00007f65e1385a48 error 15
>
>
>I conclude that the 1561 is not the process ID.
>
>Can you shed any light on this?
>
>Evan.
>
>On 04/26/2013 11:00 PM, Balazs Scheidler wrote:
> >
>> Strange, indeed. The supervisor gives up if the restarted daemon exits for some reason. Eg. If there's an initialization error it gives up. Any indication in the logs?
>>
>> Evan Rempel <erempel at uvic.ca> wrote:
>>
>>> We are sing the log line
>>>
>>> supervise/syslog-ng[27221]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11'
>>>
>>>
>>> and it looks like it should restart, but instead of restarting,
>>> the supervisor terminates and then no syslog-ng process is running.
>>>
>>> Is this a bug in the supervisor?
>>> ______________________________________________________________________________
>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>
>
>
>-- 
>Evan Rempel                                      erempel at uvic.ca
>Senior Systems Administrator                        250.721.7691
>Data Centre Services, University Systems, University of Victoria
>______________________________________________________________________________
>Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>


More information about the syslog-ng mailing list