Re: [syslog-ng] supervisor not restarting a failed daemon process
Hi, No kernel source handy but my guess is that the kernel is logging the tid value, which is the same as pid as long as the process is single threaded. I've checked the supervisor code and the only way this could happen is a fork/pipe error which is not logged. It should be logged but who knows that message can be lost. The supervisor attempts to restart 3 times then gives up. Hmmm the supervisor messages may be redirected to syslog after the first startup which might explain why they don't get logged. But why does fork/pipe fail? Hope this helps. Evan Rempel <erempel@uvic.ca> wrote:
The logs below show a "standard" syslog-ng processID=15017 that reads /dev/log /proc/kmsg The second instance of syslog-ng is what we call our "server" which just listens on the network ports and does all of the complex patterndb, filtering and routing to destination processes
kern.info kernel: syslog-ng[1561]: segfault at 7f65c0000078 ip 00007f65c0000078 sp 00007f65e1385a48 error 15 --- this is the server instance segfaulting (I assume, see WAIT below)
syslog.notice syslog-ng[15017]: Syslog connection closed; fd='20', client='AF_INET(142.104.47.145:49803)', local='AF_INET(127.0.0.1:1514)' syslog.notice syslog-ng[15017]: Syslog connection broken; fd='14', server='AF_INET(142.104.47.146:514)', time_reopen='5' --- this was the standard syslog loosing connection to the server, and detecting the drop of the server instance destination to it.
daemon.info syslog-ng-stats: server stopping on socket "/var/local/syslog-ng.server.ctl" daemon.info msgid_profiler[832]: committing residual data local0.info flare-timer[834]: stopping local0.info action-handler[833]: stopping --- these are all program destinations of the server instance shutting down gracefully after the close of their stdin.
daemon.crit supervise/syslog-ng[27221]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11' syslog.err syslog-ng[15017]: Syslog connection failed; fd='14', server='AF_INET(142.104.47.146:514)', error='Connection refused (111)', time_reopen='5'
This "connection failed" message repeats every 5 seconds until I restart the server instance.
syslog.notice syslog-ng[1911]: syslog-ng starting up; version='3.4.1'
So it does not look like there is anything in the logs about attempted restarts.
WAIT...
This is really odd. The line
kern.info kernel: syslog-ng[1561]: segfault at 7f65c0000078 ip 00007f65c0000078 sp 00007f65e1385a48 error 15
implies that there was a process ID 1561 that segfaulted, but that line is the ONLY logged line with that process ID. We take ps snapshots every 15 minutes, and those snapshots don't show anything for that process ID. Also, the supervisor processID is shown
USER PID PPID NI PRI CPU VSZ ELAPSED TIME COMMAND root 27221 1 0 19 - 26556 9-04:00:59 00:00:00 supervising syslog-ng root 27222 27221 0 19 - 977064 9-04:00:59 1-02:52:35 /usr/local/sbin/syslog-ng --cfgfile= ...
which matches the line daemon.crit supervise/syslog-ng[27221]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11'
so its child which dies should have been processID 27222 so why is the log line kern.info kernel: syslog-ng[1561]: segfault at 7f65c0000078 ip 00007f65c0000078 sp 00007f65e1385a48 error 15
I conclude that the 1561 is not the process ID.
Can you shed any light on this?
Evan.
On 04/26/2013 11:00 PM, Balazs Scheidler wrote:
Strange, indeed. The supervisor gives up if the restarted daemon exits for some reason. Eg. If there's an initialization error it gives up. Any indication in the logs?
Evan Rempel <erempel@uvic.ca> wrote:
We are sing the log line
supervise/syslog-ng[27221]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='11'
and it looks like it should restart, but instead of restarting, the supervisor terminates and then no syslog-ng process is running.
Is this a bug in the supervisor? ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
-- Evan Rempel erempel@uvic.ca Senior Systems Administrator 250.721.7691 Data Centre Services, University Systems, University of Victoria ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Where does the supervisor log to? In my case, the syslog-ng process that was reading from /dev/log and /proc/kmsg did NOT experience any problems. If anything was logged via the OS syslog API then I would see it in my log. If it is logged via the "internal()" syslog-ng destination, then does the supervisor process the syslog-ng.conf file as well? I have had two cases recently, one a signal 11 (SIGSEGV) and another signal 6 (SIGABRT). The first is probably be a bug in syslog-ng but the second would have to be root doing a kill -ABRT for the process. That would be weird and isn't in our logs (we are supposed to use sudo for all commands). I'll keep this in mind as I monitor this symptom. Evan. On 04/29/2013 11:16 AM, Balazs Scheidler wrote:
Hi,
No kernel source handy but my guess is that the kernel is logging the tid value, which is the same as pid as long as the process is single threaded.
I've checked the supervisor code and the only way this could happen is a fork/pipe error which is not logged. It should be logged but who knows that message can be lost. The supervisor attempts to restart 3 times then gives up.
Hmmm the supervisor messages may be redirected to syslog after the first startup which might explain why they don't get logged.
But why does fork/pipe fail?
Hope this helps.
I must be misunderstanding the timezone handling of syslog-ng because I can't get it to behave the way I understand the manual to indicate. I have a few hosts that can only syslog in UTS, including the time zone, so all time stamps look like "2013-05-06T15:45:06+00:00". Two things happen. 1. The file name destination /var/syslog/archive/syslog.$S_YEAR$S_MONTH$S_DAY.${S_HOUR}0000 that does not have any timezone specified, and the global option of keep-timestamp is yes. This destination uses the numbers from the UTC time stamp, but the manual indicates in section "2.5. Timezones and daylight saving" ... "Step 5. When macro expansions are used in the destination filenames, the local timezone is used." The local time zone is "2013-05-06T08:45:06-07:00" so I expect to have the log file of /var/syslog/archive/syslog.20130506.080000 but instead, the log file is /var/syslog/archive/syslog.20130506.150000 2. If I am logging from many hosts, in many timezones, and am logging to syslog servers in different time zones, I would like the logs written in the time zone of the syslog server. I realize that there is a time-zone() setting for a destination, but that requires a specific time zone and can not handle daylight savings time. Is there a way to configure a destination time zone that follows the localhost daylight savings time zone? Thanks. Evan.
On May 6, 2013 5:53 PM, "Evan Rempel" <erempel@uvic.ca> wrote:
I must be misunderstanding the timezone handling of syslog-ng because I
can't
get it to behave the way I understand the manual to indicate.
I have a few hosts that can only syslog in UTS, including the time zone, so all time stamps look like "2013-05-06T15:45:06+00:00".
Two things happen.
1. The file name destination /var/syslog/archive/syslog.$S_YEAR$S_MONTH$S_DAY.${S_HOUR}0000 that does not have any timezone specified, and the global option of keep-timestamp is yes.
This destination uses the numbers from the UTC time stamp, but the manual indicates in section "2.5. Timezones and daylight saving" ... "Step 5. When macro expansions are used in the destination filenames, the local timezone is used."
The local time zone is "2013-05-06T08:45:06-07:00" so I expect to have the log file of /var/syslog/archive/syslog.20130506.080000 but instead, the log file is /var/syslog/archive/syslog.20130506.150000
I'd have to check the code for this. ... I tried but it's difficult to read code on a phone. It seems to do what the docs say, convert the timestamp to the local timezone. But t I haven't tried at this time
2. If I am logging from many hosts, in many timezones, and am logging to
syslog servers
in different time zones, I would like the logs written in the time
zone of the syslog server.
I realize that there is a time-zone() setting for a destination, but
that requires a specific
time zone and can not handle daylight savings time.
It's been possible since 3.0 to use city names in timezone options. Eg time-zone("Europe/Budapest") should work just fine.
Is there a way to configure a destination time zone that follows the
localhost daylight savings
time zone?
Thanks.
Evan.
______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
participants (2)
-
Balazs Scheidler
-
Evan Rempel