[syslog-ng] Problem with syslog-ng 1.6.8 and HUP
Tessier.Marc.2 at hydro.qc.ca
Tessier.Marc.2 at hydro.qc.ca
Wed Sep 10 14:47:09 CEST 2008
Hi,
I have been working on an issue for the last few days with one
of our main centralized syslog servers which is a SPARC Solaris 9 box.
It is running syslog-ng 1.6.8. Our log rotation script moves,
compresses and then touches a new file depending on size. Then script
then sends an HUP to syslog-ng in the following way :
/usr/bin/pkill -HUP syslog-ng > /dev/null 2>&1
I have added some debugging to the script to make sure it grabs
the proper PID, and logs it independantly :
Sep 10 07:58:01 machine01 syslogmgr[8304]: info: signaling
Syslog/Syslog-ng (HUP) at PID 4362.
And it is :
root at spdmze21# ps -ef | grep syslog-ng
root 4362 1 2 Sep 07 ? 160:38 /usr/sbin/syslog-ng
Now in the log files, it seems syslog-ng isn't always reacting
to the kill. This is what I get as far as syslog-ng noticing the signal
in the last 2 days :
root at machine01# grep syslog-ng complete.log | grep restarting
Sep 9 09:58:06 machine01 syslog-ng[4362]: SIGHUP received, restarting
syslog-ng
Sep 9 15:58:03 machine01 syslog-ng[4362]: SIGHUP received, restarting
syslog-ng
Sep 9 19:58:07 machine01 syslog-ng[4362]: SIGHUP received, restarting
syslog-ng
Sep 10 01:58:04 machine01 syslog-ng[4362]: SIGHUP received, restarting
syslog-ng
Sep 10 03:58:05 machine01 syslog-ng[4362]: SIGHUP received, restarting
syslog-ng
However, it was signaled everyhour :
Sep 10 00:58:02 machine01 syslogmgr[20532]: [ID 702911 daemon.info]
info: signaling Syslog/Syslog-ng (HUP) at PID 4362.
Sep 10 01:58:04 machine01 syslogmgr[26686]: [ID 702911 daemon.info]
info: signaling Syslog/Syslog-ng (HUP) at PID 4362.
Sep 10 02:58:02 machine01 syslogmgr[3702]: [ID 702911 daemon.info] info:
signaling Syslog/Syslog-ng (HUP) at PID 4362.
Sep 10 03:58:05 machine01 syslogmgr[9935]: [ID 702911 daemon.info] info:
signaling Syslog/Syslog-ng (HUP) at PID 4362.
Sep 10 04:58:04 machine01 syslogmgr[17262]: [ID 702911 daemon.info]
info: signaling Syslog/Syslog-ng (HUP) at PID 4362.
Sep 10 05:58:02 machine01 syslogmgr[24178]: [ID 702911 daemon.info]
info: signaling Syslog/Syslog-ng (HUP) at PID 4362.
Sep 10 06:58:02 machine01 syslogmgr[1748]: [ID 702911 daemon.info] info:
signaling Syslog/Syslog-ng (HUP) at PID 4362.
Sep 10 07:58:01 machine01 syslogmgr[8304]: [ID 702911 daemon.info] info:
signaling Syslog/Syslog-ng (HUP) at PID 4362.
Now the problem this creates for me is that it leaves a bunch of
open file descriptors which don't point to a file anymore :
root at machine01# lsof -p 4362 -s | grep d80
syslog-ng 4362 root 7w VREG 85,80 9699746 40
/var/adm/log (/dev/md/dsk/d80)
syslog-ng 4362 root 17w VREG 85,80 3223056 500412
/var/adm/log (/dev/md/dsk/d80)
syslog-ng 4362 root 18w VREG 85,80 11604669 612873
/var/adm/log (/dev/md/dsk/d80)
syslog-ng 4362 root 20w VREG 85,80 8617243 90
/var/adm/log (/dev/md/dsk/d80)
syslog-ng 4362 root 23w VREG 85,80 11698446 612889
/var/adm/log (/dev/md/dsk/d80)
...
This causes our rotation to mess up and thus the filesystem to
eventually become filled. If I fire pkill manually, it works :
root at machine01# /usr/bin/pkill -HUP syslog-ng > /dev/null 2>&1
root at machine01# lsof -p 4362 -s | grep d80
root at machine01#
The script itself is called every hour from root's crontab. At
this point, I am completely stumped. Even looking at the changelog from
1.6.8 to 1.6.12 (latest 1.6 tree version) I don't see anything that
addresses this issue. Now since I have not been able to reproduce a
non-working HUP, I don't have a truss output of when it happens.
Thank you for any help!
Marc Tessier
Analyste Support Technique
Support Technique des Infrastructures
courriel : tessier.marc.2 at hydro.qc.ca
<mailto:tessier.marc.2 at hydro.qc.ca>
Tel: (514) 840-3000 Poste #6467
More information about the syslog-ng
mailing list