Hello, I am having an issue with a solaris installation of the syslog-ng. It is configured such that all the logs are stored different per-ip folders. This is my centralized logging device, so it is fairly heavily loaded with receiving logs from a few dozen hosts. The syslog-ng process locks up every two to three weeks, with no messages logging to any of the files. The only way of getting it back is kill -9 the process and restart it. Is there any known issue of same sorts and is there any other way around it other than recycling the daemon every night? here is the version info: bash-3.00# syslog-ng --version syslog-ng 3.0.4 Revision: ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 Compile-Date: Aug 11 2009 10:44:17 Enable-Threads: on Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-Sun-STREAMS: on Enable-Sun-Door: on Enable-IPv6: off Enable-Spoof-Source: on Enable-TCP-Wrapper: off Enable-SSL: on Enable-SQL: on Enable-Linux-Caps: off Enable-Pcre: on bash-3.00# uname -a SunOS prelude 5.10 Generic_137137-09 sun4v sparc SUNW,T5240 Thanks! -igor Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP Network Architect
Hi Igor, Can you show me truss output or backtrace of the stuck syslog-ng?: truss: truss -f -p "syslog-ng pid" backtrace: kill -11 "syslog-ng pid" (syslog-ng will drop a core file) gdb syslog-ng core bt full Igor Manassypov írta:
Hello,
I am having an issue with a solaris installation of the syslog-ng. It is configured such that all the logs are stored different per-ip folders. This is my centralized logging device, so it is fairly heavily loaded with receiving logs from a few dozen hosts. The syslog-ng process locks up every two to three weeks, with no messages logging to any of the files. The only way of getting it back is kill -9 the process and restart it.
Is there any known issue of same sorts and is there any other way around it other than recycling the daemon every night?
here is the version info:
bash-3.00# syslog-ng --version syslog-ng 3.0.4 Revision: ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 Compile-Date: Aug 11 2009 10:44:17 Enable-Threads: on Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-Sun-STREAMS: on Enable-Sun-Door: on Enable-IPv6: off Enable-Spoof-Source: on Enable-TCP-Wrapper: off Enable-SSL: on Enable-SQL: on Enable-Linux-Caps: off Enable-Pcre: on
bash-3.00# uname -a SunOS prelude 5.10 Generic_137137-09 sun4v sparc SUNW,T5240 Thanks!
-igor
Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP Network Architect
------------------------------------------------------------------------
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
Hi Zoltan, Here are the traces: bash-3.00# ps -eaf | grep syslog root 12694 12616 0 11:37:07 pts/1 0:00 grep syslog root 13012 1 0 Oct 21 ? 0:00 syslog-ng -v root 13013 13012 0 Oct 21 ? 0:41 syslog-ng -v root 13620 1 0 Oct 08 ? 0:00 /usr/local/sbin/syslog-ng root 13621 13620 0 Oct 08 ? 6:16 /usr/local/sbin/syslog-ng bash-3.00# truss -f -p "13620" 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) (sleeping...) 13620: Received signal #11, SIGSEGV, in waitid() [default] 13620: siginfo: SIGSEGV pid=12717 uid=0 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) Err#4 EINTR Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 79156 ] #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 (gdb) bt full #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 No symbol table info available. #1 0xfecee038 in _waitpid () from /lib/libc.so.1 No symbol table info available. #2 0xfed3a70c in waitpid () from /lib/libc.so.1 No symbol table info available. #3 0x0003017c in g_process_start () at gprocess.c:1042 rc = 0 deadlock = 0 pid = 13621 __PRETTY_FUNCTION__ = "g_process_start" #4 0x0001c214 in main (argc=1, argv=0xffbffd14) at main.c:371 cfg = (GlobalConfig *) 0x10034 rc = 310272 ctx = (GOptionContext *) 0x76030 error = (GError *) 0x0 Please let me know if I can provide you with more information, Thanks! --- On Tue, 11/3/09, Pallagi Zoltán <pzolee@balabit.hu> wrote: From: Pallagi Zoltán <pzolee@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Received: Tuesday, November 3, 2009, 11:10 AM Hi Igor, Can you show me truss output or backtrace of the stuck syslog-ng?: truss: truss -f -p "syslog-ng pid" backtrace: kill -11 "syslog-ng pid" (syslog-ng will drop a core file) gdb syslog-ng core bt full Igor Manassypov írta: Hello, I am having an issue with a solaris installation of the syslog-ng. It is configured such that all the logs are stored different per-ip folders. This is my centralized logging device, so it is fairly heavily loaded with receiving logs from a few dozen hosts. The syslog-ng process locks up every two to three weeks, with no messages logging to any of the files. The only way of getting it back is kill -9 the process and restart it. Is there any known issue of same sorts and is there any other way around it other than recycling the daemon every night? here is the version info: bash-3.00# syslog-ng --version syslog-ng 3.0.4 Revision: ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 Compile-Date: Aug 11 2009 10:44:17 Enable-Threads: on Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-Sun-STREAMS: on Enable-Sun-Door: on Enable-IPv6: off Enable-Spoof-Source: on Enable-TCP-Wrapper: off Enable-SSL: on Enable-SQL: on Enable-Linux-Caps: off Enable-Pcre: on bash-3.00# uname -a SunOS prelude 5.10 Generic_137137-09 sun4v sparc SUNW,T5240 Thanks! -igor Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP Network Architect ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
Hi, The problem is that you killed the supervisor process, which restarts syslog-ng in case it crashes. However the hang is not in this part, but in its child. So by looking at the ps output, I'd say that in this situation you should have trussed 13621 and not its parent. On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov wrote:
Hi Zoltan,
Here are the traces:
bash-3.00# ps -eaf | grep syslog root 12694 12616 0 11:37:07 pts/1 0:00 grep syslog root 13012 1 0 Oct 21 ? 0:00 syslog-ng -v root 13013 13012 0 Oct 21 ? 0:41 syslog-ng -v root 13620 1 0 Oct 08 ? 0:00 /usr/local/sbin/syslog-ng root 13621 13620 0 Oct 08 ? 6:16 /usr/local/sbin/syslog-ng bash-3.00# truss -f -p "13620" 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) (sleeping...)
13620: Received signal #11, SIGSEGV, in waitid() [default] 13620: siginfo: SIGSEGV pid=12717 uid=0 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) Err#4 EINTR
Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 79156 ] #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 (gdb) bt full #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 No symbol table info available. #1 0xfecee038 in _waitpid () from /lib/libc.so.1 No symbol table info available. #2 0xfed3a70c in waitpid () from /lib/libc.so.1 No symbol table info available. #3 0x0003017c in g_process_start () at gprocess.c:1042 rc = 0 deadlock = 0 pid = 13621 __PRETTY_FUNCTION__ = "g_process_start" #4 0x0001c214 in main (argc=1, argv=0xffbffd14) at main.c:371 cfg = (GlobalConfig *) 0x10034 rc = 310272 ctx = (GOptionContext *) 0x76030 error = (GError *) 0x0
Please let me know if I can provide you with more information,
Thanks!
--- On Tue, 11/3/09, Pallagi Zoltán <pzolee@balabit.hu> wrote:
From: Pallagi Zoltán <pzolee@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Received: Tuesday, November 3, 2009, 11:10 AM
Hi Igor,
Can you show me truss output or backtrace of the stuck syslog-ng?: truss:
truss -f -p "syslog-ng pid"
backtrace:
kill -11 "syslog-ng pid" (syslog-ng will drop a core file) gdb syslog-ng core bt full
Igor Manassypov írta: > Hello, > > > I am having an issue with a solaris installation of the > syslog-ng. It is configured such that all the logs are > stored different per-ip folders. This is my centralized > logging device, so it is fairly heavily loaded with > receiving logs from a few dozen hosts. The syslog-ng process > locks up every two to three weeks, with no messages logging > to any of the files. The only way of getting it back is kill > -9 the process and restart it. > > Is there any known issue of same sorts and is there any > other way around it other than recycling the daemon every > night? > > > here is the version info: > > bash-3.00# syslog-ng --version > syslog-ng 3.0.4 > Revision: ssh > +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 > Compile-Date: Aug 11 2009 10:44:17 > Enable-Threads: on > Enable-Debug: off > Enable-GProf: off > Enable-Memtrace: off > Enable-Sun-STREAMS: on > Enable-Sun-Door: on > Enable-IPv6: off > Enable-Spoof-Source: on > Enable-TCP-Wrapper: off > Enable-SSL: on > Enable-SQL: on > Enable-Linux-Caps: off > Enable-Pcre: on > > bash-3.00# uname -a > SunOS prelude 5.10 Generic_137137-09 sun4v sparc SUNW,T5240 > Thanks! > > -igor > > Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP Network > Architect > > ____________________________________________________________ > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > >
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
-- Bazsi
I see, I will resend the trace once I catch it locked up. Thanks for your help, -igor --- On Tue, 11/3/09, Balazs Scheidler <bazsi@balabit.hu> wrote: From: Balazs Scheidler <bazsi@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Cc: "Pallagi Zoltán" <pzolee@balabit.hu>, network@ci.com Received: Tuesday, November 3, 2009, 2:11 PM Hi, The problem is that you killed the supervisor process, which restarts syslog-ng in case it crashes. However the hang is not in this part, but in its child. So by looking at the ps output, I'd say that in this situation you should have trussed 13621 and not its parent. On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov wrote:
Hi Zoltan,
Here are the traces:
bash-3.00# ps -eaf | grep syslog root 12694 12616 0 11:37:07 pts/1 0:00 grep syslog root 13012 1 0 Oct 21 ? 0:00 syslog-ng -v root 13013 13012 0 Oct 21 ? 0:41 syslog-ng -v root 13620 1 0 Oct 08 ? 0:00 /usr/local/sbin/syslog-ng root 13621 13620 0 Oct 08 ? 6:16 /usr/local/sbin/syslog-ng bash-3.00# truss -f -p "13620" 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) (sleeping...)
13620: Received signal #11, SIGSEGV, in waitid() [default] 13620: siginfo: SIGSEGV pid=12717 uid=0 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) Err#4 EINTR
Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 79156 ] #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 (gdb) bt full #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 No symbol table info available. #1 0xfecee038 in _waitpid () from /lib/libc.so.1 No symbol table info available. #2 0xfed3a70c in waitpid () from /lib/libc.so.1 No symbol table info available. #3 0x0003017c in g_process_start () at gprocess.c:1042 rc = 0 deadlock = 0 pid = 13621 __PRETTY_FUNCTION__ = "g_process_start" #4 0x0001c214 in main (argc=1, argv=0xffbffd14) at main.c:371 cfg = (GlobalConfig *) 0x10034 rc = 310272 ctx = (GOptionContext *) 0x76030 error = (GError *) 0x0
Please let me know if I can provide you with more information,
Thanks!
--- On Tue, 11/3/09, Pallagi Zoltán <pzolee@balabit.hu> wrote: From: Pallagi Zoltán <pzolee@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Received: Tuesday, November 3, 2009, 11:10 AM Hi Igor, Can you show me truss output or backtrace of the stuck syslog-ng?: truss: truss -f -p "syslog-ng pid" backtrace: kill -11 "syslog-ng pid" (syslog-ng will drop a core file) gdb syslog-ng core bt full Igor Manassypov írta: > Hello, > > > I am having an issue with a solaris installation of the > syslog-ng. It is configured such that all the logs are > stored different per-ip folders. This is my centralized > logging device, so it is fairly heavily loaded with > receiving logs from a few dozen hosts. The syslog-ng process > locks up every two to three weeks, with no messages logging > to any of the files. The only way of getting it back is kill > -9 the process and restart it. > > Is there any known issue of same sorts and is there any > other way around it other than recycling the daemon every > night? > > > here is the version info: > > bash-3.00# syslog-ng --version > syslog-ng 3.0.4 > Revision: ssh > +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 > Compile-Date: Aug 11 2009 10:44:17 > Enable-Threads: on > Enable-Debug: off > Enable-GProf: off > Enable-Memtrace: off > Enable-Sun-STREAMS: on > Enable-Sun-Door: on > Enable-IPv6: off > Enable-Spoof-Source: on > Enable-TCP-Wrapper: off > Enable-SSL: on > Enable-SQL: on > Enable-Linux-Caps: off > Enable-Pcre: on > > bash-3.00# uname -a > SunOS prelude 5.10 Generic_137137-09 sun4v sparc SUNW,T5240 > Thanks! > > -igor > > Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP Network > Architect > > ____________________________________________________________ > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > > ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
-- Bazsi
Would this one make more sense? bash-3.00# ps -eaf | grep syslog root 22562 22561 0 Nov 04 ? 0:30 /usr/local/sbin/syslog-ng root 22561 1 0 Nov 04 ? 0:00 /usr/local/sbin/syslog-ng bash-3.00# truss -f -p 22562 22562/2: door_return(0x00000000, 0, 0x00000000, 0) (sleeping...) 22562/1: lwp_park(0x00000000, 0) (sleeping...) 22562/1: Received signal #11, SIGSEGV, in lwp_park() [default] 22562/1: siginfo: SIGSEGV pid=12717 uid=0 22562/1: lwp_park(0x00000000, 0) Err#4 EINTR Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 88098 ] [New process 153634 ] #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 bash-3.00# gdb syslog-ng core Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 88098 ] [New process 153634 ] #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 (gdb) --- On Tue, 11/3/09, Balazs Scheidler <bazsi@balabit.hu> wrote: From: Balazs Scheidler <bazsi@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Cc: "Pallagi Zoltán" <pzolee@balabit.hu>, network@ci.com Date: Tuesday, November 3, 2009, 2:11 PM Hi, The problem is that you killed the supervisor process, which restarts syslog-ng in case it crashes. However the hang is not in this part, but in its child. So by looking at the ps output, I'd say that in this situation you should have trussed 13621 and not its parent. On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov wrote:
Hi Zoltan,
Here are the traces:
bash-3.00# ps -eaf | grep syslog root 12694 12616 0 11:37:07 pts/1 0:00 grep syslog root 13012 1 0 Oct 21 ? 0:00 syslog-ng -v root 13013 13012 0 Oct 21 ? 0:41 syslog-ng -v root 13620 1 0 Oct 08 ? 0:00 /usr/local/sbin/syslog-ng root 13621 13620 0 Oct 08 ? 6:16 /usr/local/sbin/syslog-ng bash-3.00# truss -f -p "13620" 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) (sleeping...)
13620: Received signal #11, SIGSEGV, in waitid() [default] 13620: siginfo: SIGSEGV pid=12717 uid=0 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) Err#4 EINTR
Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 79156 ] #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 (gdb) bt full #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 No symbol table info available. #1 0xfecee038 in _waitpid () from /lib/libc.so.1 No symbol table info available. #2 0xfed3a70c in waitpid () from /lib/libc.so.1 No symbol table info available. #3 0x0003017c in g_process_start () at gprocess.c:1042 rc = 0 deadlock = 0 pid = 13621 __PRETTY_FUNCTION__ = "g_process_start" #4 0x0001c214 in main (argc=1, argv=0xffbffd14) at main.c:371 cfg = (GlobalConfig *) 0x10034 rc = 310272 ctx = (GOptionContext *) 0x76030 error = (GError *) 0x0
Please let me know if I can provide you with more information,
Thanks!
--- On Tue, 11/3/09, Pallagi Zoltán <pzolee@balabit.hu> wrote: From: Pallagi Zoltán <pzolee@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Received: Tuesday, November 3, 2009, 11:10 AM Hi Igor, Can you show me truss output or backtrace of the stuck syslog-ng?: truss: truss -f -p "syslog-ng pid" backtrace: kill -11 "syslog-ng pid" (syslog-ng will drop a core file) gdb syslog-ng core bt full Igor Manassypov írta: > Hello, > > > I am having an issue with a solaris installation of the > syslog-ng. It is configured such that all the logs are > stored different per-ip folders. This is my centralized > logging device, so it is fairly heavily loaded with > receiving logs from a few dozen hosts. The syslog-ng process > locks up every two to three weeks, with no messages logging > to any of the files. The only way of getting it back is kill > -9 the process and restart it. > > Is there any known issue of same sorts and is there any > other way around it other than recycling the daemon every > night? > > > here is the version info: > > bash-3.00# syslog-ng --version > syslog-ng 3.0.4 > Revision: ssh > +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 > Compile-Date: Aug 11 2009 10:44:17 > Enable-Threads: on > Enable-Debug: off > Enable-GProf: off > Enable-Memtrace: off > Enable-Sun-STREAMS: on > Enable-Sun-Door: on > Enable-IPv6: off > Enable-Spoof-Source: on > Enable-TCP-Wrapper: off > Enable-SSL: on > Enable-SQL: on > Enable-Linux-Caps: off > Enable-Pcre: on > > bash-3.00# uname -a > SunOS prelude 5.10 Generic_137137-09 sun4v sparc SUNW,T5240 > Thanks! > > -igor > > Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP Network > Architect > > ____________________________________________________________ > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > > ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
-- Bazsi
Igor Manassypov írta:
Would this one make more sense?
bash-3.00# ps -eaf | grep syslog root 22562 22561 0 Nov 04 ? 0:30 /usr/local/sbin/syslog-ng root 22561 1 0 Nov 04 ? 0:00 /usr/local/sbin/syslog-ng
bash-3.00# truss -f -p 22562 22562/2: door_return(0x00000000, 0, 0x00000000, 0) (sleeping...) 22562/1: lwp_park(0x00000000, 0) (sleeping....) 22562/1: Received signal #11, SIGSEGV, in lwp_park() [default] 22562/1: siginfo: SIGSEGV pid=12717 uid=0 22562/1: lwp_park(0x00000000, 0) Err#4 EINTR
Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 88098 ] [New process 153634 ] #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1
bash-3.00# gdb syslog-ng core
Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 88098 ] [New process 153634 ] #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 (gdb)
Please show us output of "bt full" too
--- On *Tue, 11/3/09, Balazs Scheidler /<bazsi@balabit.hu>/* wrote:
From: Balazs Scheidler <bazsi@balabit..hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Cc: "Pallagi Zoltán" <pzolee@balabit.hu>, network@ci.com Date: Tuesday, November 3, 2009, 2:11 PM
Hi,
The problem is that you killed the supervisor process, which restarts syslog-ng in case it crashes.. However the hang is not in this part, but in its child.
So by looking at the ps output, I'd say that in this situation you should have trussed 13621 and not its parent.
On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov wrote: > Hi Zoltan, > > > Here are the traces: > > bash-3.00# ps -eaf | grep syslog > root 12694 12616 0 11:37:07 pts/1 0:00 grep syslog > root 13012 1 0 Oct 21 ? 0:00 syslog-ng -v > root 13013 13012 0 Oct 21 ? 0:41 syslog-ng -v > root 13620 1 0 Oct 08 ? > 0:00 /usr/local/sbin/syslog-ng > root 13621 13620 0 Oct 08 ? > 6:16 /usr/local/sbin/syslog-ng > bash-3.00# truss -f -p "13620" > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) > (sleeping...) > > 13620: Received signal #11, SIGSEGV, in waitid() [default] > 13620: siginfo: SIGSEGV pid=12717 uid=0 > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) Err#4 EINTR > > Core was generated by `/usr/local/sbin/syslog-ng'. > Program terminated with signal 11, Segmentation fault. > [New process 79156 ] > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > (gdb) bt full > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > No symbol table info available. > #1 0xfecee038 in _waitpid () from /lib/libc.so.1 > No symbol table info available. > #2 0xfed3a70c in waitpid () from /lib/libc.so.1 > No symbol table info available. > #3 0x0003017c in g_process_start () at gprocess.c:1042 > rc = 0 > deadlock = 0 > pid = 13621 > __PRETTY_FUNCTION__ = "g_process_start" > #4 0x0001c214 in main (argc=1, argv=0xffbffd14) at main.c:371 > cfg = (GlobalConfig *) 0x10034 > rc = 310272 > ctx = (GOptionContext *) 0x76030 > error = (GError *) 0x0 > > Please let me know if I can provide you with more information, > > Thanks! > > --- On Tue, 11/3/09, Pallagi Zoltán <pzolee@balabit.hu </mc/compose?to=pzolee@balabit.hu>> wrote: > > From: Pallagi Zoltán <pzolee@balabit.hu </mc/compose?to=pzolee@balabit.hu>> > Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a > while > To: imanassypov@rogers.com </mc/compose?to=imanassypov@rogers.com>, "Syslog-ng users' and developers' > mailing list" <syslog-ng@lists.balabit.hu </mc/compose?to=syslog-ng@lists.balabit.hu>> > Received: Tuesday, November 3, 2009, 11:10 AM > > Hi Igor, > > Can you show me truss output or backtrace of the stuck > syslog-ng?: > truss: > > truss -f -p "syslog-ng pid" > > backtrace: > > kill -11 "syslog-ng pid" (syslog-ng will drop a core file) > gdb syslog-ng core > bt full > > Igor Manassypov írta: > > Hello, > > > > > > I am having an issue with a solaris installation of the > > syslog-ng. It is configured such that all the logs are > > stored different per-ip folders. This is my centralized > > logging device, so it is fairly heavily loaded with > > receiving logs from a few dozen hosts. The syslog-ng process > > locks up every two to three weeks, with no messages logging > > to any of the files. The only way of getting it back is kill > > -9 the process and restart it. > > > > Is there any known issue of same sorts and is there any > > other way around it other than recycling the daemon every > > night? > > > > > > here is the version info: > > > > bash-3.00# syslog-ng --version > > syslog-ng 3.0.4 > > Revision: ssh > > +git://bazsi@git.balabit </mc/compose?to=bazsi@git.balabit>//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 > > Compile-Date: Aug 11 2009 10:44:17 > > Enable-Threads: on > > Enable-Debug: off > > Enable-GProf: off > > Enable-Memtrace: off > > Enable-Sun-STREAMS: on > > Enable-Sun-Door: on > > Enable-IPv6: off > > Enable-Spoof-Source: on > > Enable-TCP-Wrapper: off > > Enable-SSL: on > > Enable-SQL: on > > Enable-Linux-Caps: off > > Enable-Pcre: on > > > > bash-3.00# uname -a > > SunOS prelude 5.10 Generic_137137-09 sun4v sparc SUNW,T5240 > > Thanks! > > > > -igor > > > > Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP Network > > Architect > > > > ____________________________________________________________ > > > > ______________________________________________________________________________ > > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > > FAQ: http://www.campin.net/syslog-ng/faq.html > > > > > > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > -- Bazsi
------------------------------------------------------------------------
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
(gdb) bt full #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 No symbol table info available. #1 0xfed40c44 in cond_sleep_queue () from /lib/libc.so.1 No symbol table info available. #2 0xfed40e08 in cond_wait_queue () from /lib/libc.so.1 No symbol table info available. #3 0xfed41350 in cond_wait () from /lib/libc.so.1 No symbol table info available. #4 0xfed4138c in pthread_cond_wait () from /lib/libc.so.1 No symbol table info available. #5 0xff119d80 in g_async_queue_pop_intern_unlocked (queue=0x757e0, try=0, end_time=0x75618) at gasyncqueue.c:359 retval = (gpointer) 0xa15b8 __PRETTY_FUNCTION__ = "g_async_queue_pop_intern_unlocked" #6 0xff119e80 in g_async_queue_pop (queue=0x757e0) at gasyncqueue.c:398 retval = (gpointer) 0x757e0 __PRETTY_FUNCTION__ = "g_async_queue_pop" #7 0x0003e984 in afinter_source_dispatch (source=0x8d260, callback=0x3e9dc <afinter_source_dispatch_msg>, user_data=0x8d1e0) at afinter.c:112 msg = (LogMessage *) 0xa0dc0 path_options = {flow_control = -1, matched = 0x0} tv = {tv_sec = 1257363112, tv_usec = 441817} #8 0xff143564 in g_main_context_dispatch (context=0x8d158) at gmain.c:2144 No locals. #9 0xff1459a4 in g_main_context_iterate (context=0x8d158, block=1, dispatch=1, self=0x76030) at gmain.c:2778 max_priority = 2147483647 timeout = 4000 some_ready = 1 nfds = 4 allocated_nfds = 1 fds = (GPollFD *) 0x788c8 __PRETTY_FUNCTION__ = "g_main_context_iterate" #10 0xff146050 in g_main_context_iteration (context=0x8d158, may_block=1) at gmain.c:2841 retval = 1 #11 0x0001bc20 in main_loop_run (cfg=0xffbffbc8) at main.c:149 iters = 0 stats_timer_id = 0 #12 0x0001c260 in main (argc=1, argv=0xffbffd44) at main.c:394 cfg = (GlobalConfig *) 0x794d0 rc = 0 ctx = (GOptionContext *) 0x76030 error = (GError *) 0x0 Igor M., M.Eng, P.Eng Network Architect --- On Mon, 11/9/09, Pallagi Zoltán <pzolee@balabit.hu> wrote: From: Pallagi Zoltán <pzolee@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Date: Monday, November 9, 2009, 11:35 AM Igor Manassypov írta: Would this one make more sense? bash-3.00# ps -eaf | grep syslog root 22562 22561 0 Nov 04 ? 0:30 /usr/local/sbin/syslog-ng root 22561 1 0 Nov 04 ? 0:00 /usr/local/sbin/syslog-ng bash-3.00# truss -f -p 22562 22562/2: door_return(0x00000000, 0, 0x00000000, 0) (sleeping...) 22562/1: lwp_park(0x00000000, 0) (sleeping....) 22562/1: Received signal #11, SIGSEGV, in lwp_park() [default] 22562/1: siginfo: SIGSEGV pid=12717 uid=0 22562/1: lwp_park(0x00000000, 0) Err#4 EINTR Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 88098 ] [New process 153634 ] #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 bash-3.00# gdb syslog-ng core Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 88098 ] [New process 153634 ] #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 (gdb) Please show us output of "bt full" too --- On Tue, 11/3/09, Balazs Scheidler <bazsi@balabit.hu> wrote: From: Balazs Scheidler <bazsi@balabit..hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Cc: "Pallagi Zoltán" <pzolee@balabit.hu>, network@ci.com Date: Tuesday, November 3, 2009, 2:11 PM Hi, The problem is that you killed the supervisor process, which restarts syslog-ng in case it crashes.. However the hang is not in this part, but in its child. So by looking at the ps output, I'd say that in this situation you should have trussed 13621 and not its parent. On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov wrote:
Hi Zoltan,
Here are the traces:
bash-3.00# ps -eaf | grep syslog
root 12694 12616 0 11:37:07 pts/1 0:00 grep syslog
root 13012 1 0 Oct 21 ? 0:00 syslog-ng -v
root 13013 13012 0 Oct 21 ? 0:41 syslog-ng -v
root 13620 1 0 Oct 08 ?
0:00 /usr/local/sbin/syslog-ng
root 13621 13620 0 Oct 08 ?
6:16 /usr/local/sbin/syslog-ng
bash-3.00# truss -f -p "13620"
13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED)
(sleeping...)
13620: Received signal #11, SIGSEGV, in waitid() [default]
13620: siginfo: SIGSEGV pid=12717 uid=0
13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) Err#4 EINTR
Core was generated by `/usr/local/sbin/syslog-ng'.
Program terminated with signal 11, Segmentation fault.
[New process 79156 ]
#0 0xfed4ad80 in _waitid () from /lib/libc.so.1
(gdb) bt full
#0 0xfed4ad80 in _waitid () from /lib/libc.so.1
No symbol table info available.
#1 0xfecee038 in _waitpid () from /lib/libc.so.1
No symbol table info available.
#2 0xfed3a70c in waitpid () from /lib/libc.so.1
No symbol table info available.
#3 0x0003017c in g_process_start () at gprocess.c:1042
rc = 0
deadlock = 0
pid = 13621
__PRETTY_FUNCTION__ = "g_process_start"
#4 0x0001c214 in main (argc=1, argv=0xffbffd14) at main.c:371
cfg = (GlobalConfig *) 0x10034
rc = 310272
ctx = (GOptionContext *) 0x76030
error = (GError *) 0x0
Please let me know if I can provide you with more information,
Thanks!
--- On Tue, 11/3/09, Pallagi Zoltán <pzolee@balabit.hu> wrote:
From: Pallagi Zoltán <pzolee@balabit.hu>
Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a
while
To: imanassypov@rogers.com, "Syslog-ng users' and developers'
mailing list" <syslog-ng@lists.balabit.hu>
Received: Tuesday, November 3, 2009, 11:10 AM
Hi Igor,
Can you show me truss output or backtrace of the stuck
syslog-ng?:
truss:
truss -f -p "syslog-ng pid"
backtrace:
kill -11 "syslog-ng pid" (syslog-ng will drop a core file)
gdb syslog-ng core
bt full
Igor Manassypov írta:
> Hello,
>
>
> I am having an issue with a solaris installation of the
> syslog-ng. It is configured such that all the logs are
> stored different per-ip folders. This is my centralized
> logging device, so it is fairly heavily loaded with
> receiving logs from a few dozen hosts. The syslog-ng process
> locks up every two to three weeks, with no messages logging
> to any of the files. The only way of getting it back is kill
> -9 the process and restart it.
>
> Is there any known issue of same sorts and is there any
> other way around it other than recycling the daemon every
> night?
>
>
> here is the version info:
>
> bash-3.00# syslog-ng --version
> syslog-ng 3.0.4
> Revision: ssh
> +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10
> Compile-Date: Aug 11 2009 10:44:17
> Enable-Threads: on
> Enable-Debug: off
> Enable-GProf: off
> Enable-Memtrace: off
> Enable-Sun-STREAMS: on
> Enable-Sun-Door: on
> Enable-IPv6: off
> Enable-Spoof-Source: on
> Enable-TCP-Wrapper: off
> Enable-SSL: on
> Enable-SQL: on
> Enable-Linux-Caps: off
> Enable-Pcre: on
>
> bash-3.00# uname -a
> SunOS prelude 5.10 Generic_137137-09 sun4v sparc SUNW,T5240
> Thanks!
>
> -igor
>
> Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP Network
> Architect
>
>
>
>
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>
>
______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
-- Bazsi ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
Hi, This seems to be the same issue as the one fixed by this patch: Author: Balazs Scheidler <bazsi@balabit.hu> 2009-08-30 11:41:24 Committer: Balazs Scheidler <bazsi@balabit.hu> 2009-08-30 11:41:24 Parent: 1ad4da07d5305ba0140ac385d661ab6de25fc5f3 ([patterndb] estring parser length calculation must include ending quote) Child: c2e8aa58763a89cab58d05fb7a2b2a18021413b4 ([logmsg] added support for ASA timestamps) Branches: master, remotes/balabit/master, remotes/origin/master Follows: v3.0.4 Precedes: [afinter] don't block on the internal_msg_queue even in the threaded case (fixes: pub#48) A hang was reported in bugzilla ticket #48 which seems to have been caused by MARK messages interfering with local messages: * if the MARK is due in the same poll iteration as a local message * the MARK timeout is checked and the internal source is marked as having input available * then the local message comes in pushing the mark timeout further ahead in time * then the internal() dispatch callback checks the mark timeout again, but at this time it is already in the future -> * the dispatch callback falls back to fetching the internal message from internal_msg_queue, assuming it was that which caused the dispatch callback to be scheduled * this blocks indefinitely. The solution is very simple: use g_async_queue_try_pop() instead of g_async_queue_pop(), the dispatch code already takes care about a NULL message value. On Tue, 2009-11-10 at 05:09 -0800, Igor Manassypov wrote:
(gdb) bt full #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 No symbol table info available. #1 0xfed40c44 in cond_sleep_queue () from /lib/libc.so.1 No symbol table info available. #2 0xfed40e08 in cond_wait_queue () from /lib/libc.so.1 No symbol table info available. #3 0xfed41350 in cond_wait () from /lib/libc.so.1 No symbol table info available. #4 0xfed4138c in pthread_cond_wait () from /lib/libc.so.1 No symbol table info available. #5 0xff119d80 in g_async_queue_pop_intern_unlocked (queue=0x757e0, try=0, end_time=0x75618) at gasyncqueue.c:359 retval = (gpointer) 0xa15b8 __PRETTY_FUNCTION__ = "g_async_queue_pop_intern_unlocked" #6 0xff119e80 in g_async_queue_pop (queue=0x757e0) at gasyncqueue.c:398 retval = (gpointer) 0x757e0 __PRETTY_FUNCTION__ = "g_async_queue_pop" #7 0x0003e984 in afinter_source_dispatch (source=0x8d260, callback=0x3e9dc <afinter_source_dispatch_msg>, user_data=0x8d1e0)
at afinter.c:112 msg = (LogMessage *) 0xa0dc0 path_options = {flow_control = -1, matched = 0x0} tv = {tv_sec = 1257363112, tv_usec = 441817} #8 0xff143564 in g_main_context_dispatch (context=0x8d158) at gmain.c:2144 No locals. #9 0xff1459a4 in g_main_context_iterate (context=0x8d158, block=1, dispatch=1, self=0x76030) at gmain.c:2778 max_priority = 2147483647 timeout = 4000 some_ready = 1 nfds = 4 allocated_nfds = 1 fds = (GPollFD *) 0x788c8 __PRETTY_FUNCTION__ = "g_main_context_iterate" #10 0xff146050 in g_main_context_iteration (context=0x8d158, may_block=1) at gmain.c:2841 retval = 1 #11 0x0001bc20 in main_loop_run (cfg=0xffbffbc8) at main.c:149 iters = 0 stats_timer_id = 0 #12 0x0001c260 in main (argc=1, argv=0xffbffd44) at main.c:394 cfg = (GlobalConfig *) 0x794d0 rc = 0 ctx = (GOptionContext *) 0x76030 error = (GError *) 0x0
Igor M., M.Eng, P.Eng Network Architect
--- On Mon, 11/9/09, Pallagi Zoltán <pzolee@balabit.hu> wrote:
From: Pallagi Zoltán <pzolee@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Date: Monday, November 9, 2009, 11:35 AM
Igor Manassypov írta: > Would this one make more sense? > > > > bash-3.00# ps -eaf | grep syslog > root 22562 22561 0 Nov 04 ? > 0:30 /usr/local/sbin/syslog-ng > root 22561 1 0 Nov 04 ? > 0:00 /usr/local/sbin/syslog-ng > > bash-3.00# truss -f -p 22562 > 22562/2: door_return(0x00000000, 0, 0x00000000, 0) > (sleeping...) > 22562/1: lwp_park(0x00000000, 0) > (sleeping....) > 22562/1: Received signal #11, SIGSEGV, in > lwp_park() [default] > 22562/1: siginfo: SIGSEGV pid=12717 uid=0 > 22562/1: lwp_park(0x00000000, 0) > Err#4 EINTR > > Core was generated by `/usr/local/sbin/syslog-ng'. > Program terminated with signal 11, Segmentation fault. > [New process 88098 ] > [New process 153634 ] > #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 > #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 > > bash-3.00# gdb syslog-ng core > > Core was generated by `/usr/local/sbin/syslog-ng'. > Program terminated with signal 11, Segmentation fault. > [New process 88098 ] > [New process 153634 ] > #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 > (gdb) Please show us output of "bt full" too > > > --- On Tue, 11/3/09, Balazs Scheidler <bazsi@balabit.hu> > wrote: > > From: Balazs Scheidler <bazsi@balabit..hu> > Subject: Re: [syslog-ng] syslog-ng on solaris locks > up after a while > To: imanassypov@rogers.com, "Syslog-ng users' and > developers' mailing list" > <syslog-ng@lists.balabit.hu> > Cc: "Pallagi Zoltán" <pzolee@balabit.hu>, > network@ci.com > Date: Tuesday, November 3, 2009, 2:11 PM > > Hi, > > The problem is that you killed the supervisor > process, which restarts > syslog-ng in case it crashes.. However the hang is > not in this part, but > in its child. > > So by looking at the ps output, I'd say that in this > situation you > should have trussed 13621 and not its parent. > > On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov > wrote: > > Hi Zoltan, > > > > > > Here are the traces: > > > > bash-3.00# ps -eaf | grep syslog > > root 12694 12616 0 11:37:07 pts/1 0:00 > grep syslog > > root 13012 1 0 Oct 21 ? 0:00 > syslog-ng -v > > root 13013 13012 0 Oct 21 ? 0:41 > syslog-ng -v > > root 13620 1 0 Oct 08 ? > > 0:00 /usr/local/sbin/syslog-ng > > root 13621 13620 0 Oct 08 ? > > 6:16 /usr/local/sbin/syslog-ng > > bash-3.00# truss -f -p "13620" > > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED| > WTRAPPED) > > (sleeping...) > > > > 13620: Received signal #11, SIGSEGV, in > waitid() [default] > > 13620: siginfo: SIGSEGV pid=12717 uid=0 > > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED| > WTRAPPED) Err#4 EINTR > > > > Core was generated by `/usr/local/sbin/syslog-ng'. > > Program terminated with signal 11, Segmentation > fault. > > [New process 79156 ] > > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > > (gdb) bt full > > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > > No symbol table info available. > > #1 0xfecee038 in _waitpid () from /lib/libc.so.1 > > No symbol table info available. > > #2 0xfed3a70c in waitpid () from /lib/libc.so.1 > > No symbol table info available. > > #3 0x0003017c in g_process_start () at > gprocess.c:1042 > > rc = 0 > > deadlock = 0 > > pid = 13621 > > __PRETTY_FUNCTION__ = "g_process_start" > > #4 0x0001c214 in main (argc=1, argv=0xffbffd14) > at main.c:371 > > cfg = (GlobalConfig *) 0x10034 > > rc = 310272 > > ctx = (GOptionContext *) 0x76030 > > error = (GError *) 0x0 > > > > Please let me know if I can provide you with more > information, > > > > Thanks! > > > > --- On Tue, 11/3/09, Pallagi Zoltán > <pzolee@balabit.hu> wrote: > > > > From: Pallagi Zoltán <pzolee@balabit.hu> > > Subject: Re: [syslog-ng] syslog-ng on > solaris locks up after a > > while > > To: imanassypov@rogers.com, "Syslog-ng > users' and developers' > > mailing list" <syslog-ng@lists.balabit.hu> > > Received: Tuesday, November 3, 2009, 11:10 > AM > > > > Hi Igor, > > > > Can you show me truss output or backtrace > of the stuck > > syslog-ng?: > > truss: > > > > truss -f -p "syslog-ng pid" > > > > backtrace: > > > > kill -11 "syslog-ng pid" (syslog-ng will > drop a core file) > > gdb syslog-ng core > > bt full > > > > Igor Manassypov írta: > > > Hello, > > > > > > > > > I am having an issue with a solaris > installation of the > > > syslog-ng. It is configured such that > all the logs are > > > stored different per-ip folders. This is > my centralized > > > logging device, so it is fairly heavily > loaded with > > > receiving logs from a few dozen hosts. > The syslog-ng process > > > locks up every two to three weeks, with > no messages logging > > > to any of the files. The only way of > getting it back is kill > > > -9 the process and restart it. > > > > > > Is there any known issue of same sorts > and is there any > > > other way around it other than recycling > the daemon every > > > night? > > > > > > > > > here is the version info: > > > > > > bash-3.00# syslog-ng --version > > > syslog-ng 3.0.4 > > > Revision: ssh > > > > +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 > > > Compile-Date: Aug 11 2009 10:44:17 > > > Enable-Threads: on > > > Enable-Debug: off > > > Enable-GProf: off > > > Enable-Memtrace: off > > > Enable-Sun-STREAMS: on > > > Enable-Sun-Door: on > > > Enable-IPv6: off > > > Enable-Spoof-Source: on > > > Enable-TCP-Wrapper: off > > > Enable-SSL: on > > > Enable-SQL: on > > > Enable-Linux-Caps: off > > > Enable-Pcre: on > > > > > > bash-3.00# uname -a > > > SunOS prelude 5.10 Generic_137137-09 > sun4v sparc SUNW,T5240 > > > Thanks! > > > > > > -igor > > > > > > Igor Manassypov., M.Eng, P.Eng, CCIE > 23032, CCVP Network > > > Architect > > > > > > > ____________________________________________________________ > > > > > > > ______________________________________________________________________________ > > > Member info: > https://lists.balabit.hu/mailman/listinfo/syslog-ng > > > Documentation: > http://www.balabit.com/support/documentation/?product=syslog-ng > > > FAQ: > http://www.campin.net/syslog-ng/faq.html > > > > > > > > > > > > > ______________________________________________________________________________ > > Member info: > https://lists.balabit.hu/mailman/listinfo/syslog-ng > > Documentation: > http://www.balabit.com/support/documentation/?product=syslog-ng > > FAQ: http://www.campin.net/syslog-ng/faq..html > > > -- > Bazsi > > > > > ____________________________________________________________ > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > >
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
-- Bazsi
Hi Balazs, Thanks for your prompt reply. Can you please direct me to the link where I can obtain the patch? Thanks! -igor Igor M., M.Eng, P.Eng Network Architect --- On Thu, 11/12/09, Balazs Scheidler <bazsi@balabit.hu> wrote: From: Balazs Scheidler <bazsi@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Cc: "Pallagi Zoltán" <pzolee@balabit.hu> Date: Thursday, November 12, 2009, 11:11 AM Hi, This seems to be the same issue as the one fixed by this patch: Author: Balazs Scheidler <bazsi@balabit.hu> 2009-08-30 11:41:24 Committer: Balazs Scheidler <bazsi@balabit.hu> 2009-08-30 11:41:24 Parent: 1ad4da07d5305ba0140ac385d661ab6de25fc5f3 ([patterndb] estring parser length calculation must include ending quote) Child: c2e8aa58763a89cab58d05fb7a2b2a18021413b4 ([logmsg] added support for ASA timestamps) Branches: master, remotes/balabit/master, remotes/origin/master Follows: v3.0.4 Precedes: [afinter] don't block on the internal_msg_queue even in the threaded case (fixes: pub#48) A hang was reported in bugzilla ticket #48 which seems to have been caused by MARK messages interfering with local messages: * if the MARK is due in the same poll iteration as a local message * the MARK timeout is checked and the internal source is marked as having input available * then the local message comes in pushing the mark timeout further ahead in time * then the internal() dispatch callback checks the mark timeout again, but at this time it is already in the future -> * the dispatch callback falls back to fetching the internal message from internal_msg_queue, assuming it was that which caused the dispatch callback to be scheduled * this blocks indefinitely. The solution is very simple: use g_async_queue_try_pop() instead of g_async_queue_pop(), the dispatch code already takes care about a NULL message value. On Tue, 2009-11-10 at 05:09 -0800, Igor Manassypov wrote:
(gdb) bt full #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 No symbol table info available. #1 0xfed40c44 in cond_sleep_queue () from /lib/libc.so.1 No symbol table info available. #2 0xfed40e08 in cond_wait_queue () from /lib/libc.so.1 No symbol table info available. #3 0xfed41350 in cond_wait () from /lib/libc.so.1 No symbol table info available. #4 0xfed4138c in pthread_cond_wait () from /lib/libc.so.1 No symbol table info available. #5 0xff119d80 in g_async_queue_pop_intern_unlocked (queue=0x757e0, try=0, end_time=0x75618) at gasyncqueue.c:359 retval = (gpointer) 0xa15b8 __PRETTY_FUNCTION__ = "g_async_queue_pop_intern_unlocked" #6 0xff119e80 in g_async_queue_pop (queue=0x757e0) at gasyncqueue.c:398 retval = (gpointer) 0x757e0 __PRETTY_FUNCTION__ = "g_async_queue_pop" #7 0x0003e984 in afinter_source_dispatch (source=0x8d260, callback=0x3e9dc <afinter_source_dispatch_msg>, user_data=0x8d1e0)
at afinter.c:112 msg = (LogMessage *) 0xa0dc0 path_options = {flow_control = -1, matched = 0x0} tv = {tv_sec = 1257363112, tv_usec = 441817} #8 0xff143564 in g_main_context_dispatch (context=0x8d158) at gmain.c:2144 No locals. #9 0xff1459a4 in g_main_context_iterate (context=0x8d158, block=1, dispatch=1, self=0x76030) at gmain.c:2778 max_priority = 2147483647 timeout = 4000 some_ready = 1 nfds = 4 allocated_nfds = 1 fds = (GPollFD *) 0x788c8 __PRETTY_FUNCTION__ = "g_main_context_iterate" #10 0xff146050 in g_main_context_iteration (context=0x8d158, may_block=1) at gmain.c:2841 retval = 1 #11 0x0001bc20 in main_loop_run (cfg=0xffbffbc8) at main.c:149 iters = 0 stats_timer_id = 0 #12 0x0001c260 in main (argc=1, argv=0xffbffd44) at main.c:394 cfg = (GlobalConfig *) 0x794d0 rc = 0 ctx = (GOptionContext *) 0x76030 error = (GError *) 0x0
Igor M., M.Eng, P.Eng Network Architect
--- On Mon, 11/9/09, Pallagi Zoltán <pzolee@balabit.hu> wrote: From: Pallagi Zoltán <pzolee@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Date: Monday, November 9, 2009, 11:35 AM Igor Manassypov írta: > Would this one make more sense? > > > > bash-3.00# ps -eaf | grep syslog > root 22562 22561 0 Nov 04 ? > 0:30 /usr/local/sbin/syslog-ng > root 22561 1 0 Nov 04 ? > 0:00 /usr/local/sbin/syslog-ng > > bash-3.00# truss -f -p 22562 > 22562/2: door_return(0x00000000, 0, 0x00000000, 0) > (sleeping...) > 22562/1: lwp_park(0x00000000, 0) > (sleeping....) > 22562/1: Received signal #11, SIGSEGV, in > lwp_park() [default] > 22562/1: siginfo: SIGSEGV pid=12717 uid=0 > 22562/1: lwp_park(0x00000000, 0) > Err#4 EINTR > > Core was generated by `/usr/local/sbin/syslog-ng'. > Program terminated with signal 11, Segmentation fault. > [New process 88098 ] > [New process 153634 ] > #0 0xfed46df0 in __lwp_park () from /lib/libc.so..1 > #0 0xfed46df0 in __lwp_park () from /lib/libc.so..1 > > bash-3.00# gdb syslog-ng core > > Core was generated by `/usr/local/sbin/syslog-ng'. > Program terminated with signal 11, Segmentation fault. > [New process 88098 ] > [New process 153634 ] > #0 0xfed46df0 in __lwp_park () from /lib/libc.so..1 > (gdb) Please show us output of "bt full" too > > > --- On Tue, 11/3/09, Balazs Scheidler <bazsi@balabit.hu> > wrote: > > From: Balazs Scheidler <bazsi@balabit..hu> > Subject: Re: [syslog-ng] syslog-ng on solaris locks > up after a while > To: imanassypov@rogers.com, "Syslog-ng users' and > developers' mailing list" > <syslog-ng@lists.balabit.hu> > Cc: "Pallagi Zoltán" <pzolee@balabit.hu>, > network@ci.com > Date: Tuesday, November 3, 2009, 2:11 PM > > Hi, > > The problem is that you killed the supervisor > process, which restarts > syslog-ng in case it crashes.. However the hang is > not in this part, but > in its child. > > So by looking at the ps output, I'd say that in this > situation you > should have trussed 13621 and not its parent. > > On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov > wrote: > > Hi Zoltan, > > > > > > Here are the traces: > > > > bash-3.00# ps -eaf | grep syslog > > root 12694 12616 0 11:37:07 pts/1 0:00 > grep syslog > > root 13012 1 0 Oct 21 ? 0:00 > syslog-ng -v > > root 13013 13012 0 Oct 21 ? 0:41 > syslog-ng -v > > root 13620 1 0 Oct 08 ? > > 0:00 /usr/local/sbin/syslog-ng > > root 13621 13620 0 Oct 08 ? > > 6:16 /usr/local/sbin/syslog-ng > > bash-3.00# truss -f -p "13620" > > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED| > WTRAPPED) > > (sleeping...) > > > > 13620: Received signal #11, SIGSEGV, in > waitid() [default] > > 13620: siginfo: SIGSEGV pid=12717 uid=0 > > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED| > WTRAPPED) Err#4 EINTR > > > > Core was generated by `/usr/local/sbin/syslog-ng'. > > Program terminated with signal 11, Segmentation > fault. > > [New process 79156 ] > > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > > (gdb) bt full > > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > > No symbol table info available. > > #1 0xfecee038 in _waitpid () from /lib/libc.so.1 > > No symbol table info available. > > #2 0xfed3a70c in waitpid () from /lib/libc.so.1 > > No symbol table info available. > > #3 0x0003017c in g_process_start () at > gprocess.c:1042 > > rc = 0 > > deadlock = 0 > > pid = 13621 > > __PRETTY_FUNCTION__ = "g_process_start" > > #4 0x0001c214 in main (argc=1, argv=0xffbffd14) > at main.c:371 > > cfg = (GlobalConfig *) 0x10034 > > rc = 310272 > > ctx = (GOptionContext *) 0x76030 > > error = (GError *) 0x0 > > > > Please let me know if I can provide you with more > information, > > > > Thanks! > > > > --- On Tue, 11/3/09, Pallagi Zoltán > <pzolee@balabit.hu> wrote: > > > > From: Pallagi Zoltán <pzolee@balabit.hu> > > Subject: Re: [syslog-ng] syslog-ng on > solaris locks up after a > > while > > To: imanassypov@rogers.com, "Syslog-ng > users' and developers' > > mailing list" <syslog-ng@lists.balabit.hu> > > Received: Tuesday, November 3, 2009, 11:10 > AM > > > > Hi Igor, > > > > Can you show me truss output or backtrace > of the stuck > > syslog-ng?: > > truss: > > > > truss -f -p "syslog-ng pid" > > > > backtrace: > > > > kill -11 "syslog-ng pid" (syslog-ng will > drop a core file) > > gdb syslog-ng core > > bt full > > > > Igor Manassypov írta: > > > Hello, > > > > > > > > > I am having an issue with a solaris > installation of the > > > syslog-ng. It is configured such that > all the logs are > > > stored different per-ip folders. This is > my centralized > > > logging device, so it is fairly heavily > loaded with > > > receiving logs from a few dozen hosts. > The syslog-ng process > > > locks up every two to three weeks, with > no messages logging > > > to any of the files. The only way of > getting it back is kill > > > -9 the process and restart it. > > > > > > Is there any known issue of same sorts > and is there any > > > other way around it other than recycling > the daemon every > > > night? > > > > > > > > > here is the version info: > > > > > > bash-3.00# syslog-ng --version > > > syslog-ng 3.0.4 > > > Revision: ssh > > > > +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 > > > Compile-Date: Aug 11 2009 10:44:17 > > > Enable-Threads: on > > > Enable-Debug: off > > > Enable-GProf: off > > > Enable-Memtrace: off > > > Enable-Sun-STREAMS: on > > > Enable-Sun-Door: on > > > Enable-IPv6: off > > > Enable-Spoof-Source: on > > > Enable-TCP-Wrapper: off > > > Enable-SSL: on > > > Enable-SQL: on > > > Enable-Linux-Caps: off > > > Enable-Pcre: on > > > > > > bash-3.00# uname -a > > > SunOS prelude 5.10 Generic_137137-09 > sun4v sparc SUNW,T5240 > > > Thanks! > > > > > > -igor > > > > > > Igor Manassypov., M.Eng, P.Eng, CCIE > 23032, CCVP Network > > > Architect > > > > > > > ____________________________________________________________ > > > > > > > ______________________________________________________________________________ > > > Member info: > https://lists.balabit.hu/mailman/listinfo/syslog-ng > > > Documentation: > http://www.balabit.com/support/documentation/?product=syslog-ng > > > FAQ: > http://www.campin.net/syslog-ng/faq.html > > > > > > > > > > > > > ______________________________________________________________________________ > > Member info: > https://lists.balabit.hu/mailman/listinfo/syslog-ng > > Documentation: > http://www.balabit.com/support/documentation/?product=syslog-ng > > FAQ: http://www.campin.net/syslog-ng/faq..html > > > -- > Bazsi > > > > > ____________________________________________________________ > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > > ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
-- Bazsi
Hi, Grab a daily snapshot from: http://www.balabit.com/downloads/files/syslog-ng/open-source-edition/3.0/src... On Thu, 2009-11-12 at 08:28 -0800, Igor Manassypov wrote:
Hi Balazs,
Thanks for your prompt reply. Can you please direct me to the link where I can obtain the patch?
Thanks!
-- Bazsi
Hi, not really, are there multiple threads in the same core file? e.g. what is the response for "info threads"? It would be nice to have the backtrace for all threads, like this: (gdb) thread 1 (gdb) bt (gdb) thread 2 (gdb) bt and so on, for each threadid that "info thread" lists. On Fri, 2009-11-06 at 11:41 -0800, Igor Manassypov wrote:
Would this one make more sense?
bash-3.00# ps -eaf | grep syslog root 22562 22561 0 Nov 04 ? 0:30 /usr/local/sbin/syslog-ng root 22561 1 0 Nov 04 ? 0:00 /usr/local/sbin/syslog-ng
bash-3.00# truss -f -p 22562 22562/2: door_return(0x00000000, 0, 0x00000000, 0) (sleeping...) 22562/1: lwp_park(0x00000000, 0) (sleeping....) 22562/1: Received signal #11, SIGSEGV, in lwp_park() [default] 22562/1: siginfo: SIGSEGV pid=12717 uid=0 22562/1: lwp_park(0x00000000, 0) Err#4 EINTR
Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 88098 ] [New process 153634 ] #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1
bash-3.00# gdb syslog-ng core
Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 88098 ] [New process 153634 ] #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 (gdb)
--- On Tue, 11/3/09, Balazs Scheidler <bazsi@balabit.hu> wrote:
From: Balazs Scheidler <bazsi@balabit..hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Cc: "Pallagi Zoltán" <pzolee@balabit.hu>, network@ci.com Date: Tuesday, November 3, 2009, 2:11 PM
Hi,
The problem is that you killed the supervisor process, which restarts syslog-ng in case it crashes.. However the hang is not in this part, but in its child.
So by looking at the ps output, I'd say that in this situation you should have trussed 13621 and not its parent.
On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov wrote: > Hi Zoltan, > > > Here are the traces: > > bash-3.00# ps -eaf | grep syslog > root 12694 12616 0 11:37:07 pts/1 0:00 grep syslog > root 13012 1 0 Oct 21 ? 0:00 syslog-ng -v > root 13013 13012 0 Oct 21 ? 0:41 syslog-ng -v > root 13620 1 0 Oct 08 ? > 0:00 /usr/local/sbin/syslog-ng > root 13621 13620 0 Oct 08 ? > 6:16 /usr/local/sbin/syslog-ng > bash-3.00# truss -f -p "13620" > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) > (sleeping...) > > 13620: Received signal #11, SIGSEGV, in waitid() [default] > 13620: siginfo: SIGSEGV pid=12717 uid=0 > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) Err#4 EINTR > > Core was generated by `/usr/local/sbin/syslog-ng'. > Program terminated with signal 11, Segmentation fault. > [New process 79156 ] > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > (gdb) bt full > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > No symbol table info available. > #1 0xfecee038 in _waitpid () from /lib/libc.so.1 > No symbol table info available. > #2 0xfed3a70c in waitpid () from /lib/libc.so.1 > No symbol table info available. > #3 0x0003017c in g_process_start () at gprocess.c:1042 > rc = 0 > deadlock = 0 > pid = 13621 > __PRETTY_FUNCTION__ = "g_process_start" > #4 0x0001c214 in main (argc=1, argv=0xffbffd14) at main.c:371 > cfg = (GlobalConfig *) 0x10034 > rc = 310272 > ctx = (GOptionContext *) 0x76030 > error = (GError *) 0x0 > > Please let me know if I can provide you with more information, > > Thanks! > > --- On Tue, 11/3/09, Pallagi Zoltán <pzolee@balabit.hu> wrote: > > From: Pallagi Zoltán <pzolee@balabit.hu> > Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a > while > To: imanassypov@rogers.com, "Syslog-ng users' and developers' > mailing list" <syslog-ng@lists.balabit.hu> > Received: Tuesday, November 3, 2009, 11:10 AM > > Hi Igor, > > Can you show me truss output or backtrace of the stuck > syslog-ng?: > truss: > > truss -f -p "syslog-ng pid" > > backtrace: > > kill -11 "syslog-ng pid" (syslog-ng will drop a core file) > gdb syslog-ng core > bt full > > Igor Manassypov írta: > > Hello, > > > > > > I am having an issue with a solaris installation of the > > syslog-ng. It is configured such that all the logs are > > stored different per-ip folders. This is my centralized > > logging device, so it is fairly heavily loaded with > > receiving logs from a few dozen hosts. The syslog-ng process > > locks up every two to three weeks, with no messages logging > > to any of the files. The only way of getting it back is kill > > -9 the process and restart it. > > > > Is there any known issue of same sorts and is there any > > other way around it other than recycling the daemon every > > night? > > > > > > here is the version info: > > > > bash-3.00# syslog-ng --version > > syslog-ng 3.0.4 > > Revision: ssh > > +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 > > Compile-Date: Aug 11 2009 10:44:17 > > Enable-Threads: on > > Enable-Debug: off > > Enable-GProf: off > > Enable-Memtrace: off > > Enable-Sun-STREAMS: on > > Enable-Sun-Door: on > > Enable-IPv6: off > > Enable-Spoof-Source: on > > Enable-TCP-Wrapper: off > > Enable-SSL: on > > Enable-SQL: on > > Enable-Linux-Caps: off > > Enable-Pcre: on > > > > bash-3.00# uname -a > > SunOS prelude 5.10 Generic_137137-09 sun4v sparc SUNW,T5240 > > Thanks! > > > > -igor > > > > Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP Network > > Architect > > > > ____________________________________________________________ > > > > ______________________________________________________________________________ > > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > > FAQ: http://www.campin.net/syslog-ng/faq.html > > > > > > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > -- Bazsi
-- Bazsi
participants (3)
-
Balazs Scheidler
-
Igor Manassypov
-
Pallagi Zoltán