Re: [syslog-ng] syslog-ng on solaris locks up after a while
Thanks. It indeeed seems to be the same as Author: Balazs Scheidler <bazsi@balabit.hu> 2009-08-30 11:41:24 Committer: Balazs Scheidler <bazsi@balabit.hu> 2009-08-30 11:41:24 Parent: 1ad4da07d5305ba0140ac385d661ab6de25fc5f3 ([patterndb] estring parser length calculation must include ending quote) Child: c2e8aa58763a89cab58d05fb7a2b2a18021413b4 ([logmsg] added support for ASA timestamps) Branches: master, remotes/balabit/master, remotes/origin/master Follows: v3.0.4 Precedes: [afinter] don't block on the internal_msg_queue even in the threaded case (fixes: pub#48) A hang was reported in bugzilla ticket #48 which seems to have been caused by MARK messages interfering with local messages: * if the MARK is due in the same poll iteration as a local message * the MARK timeout is checked and the internal source is marked as having input available * then the local message comes in pushing the mark timeout further ahead in time * then the internal() dispatch callback checks the mark timeout again, but at this time it is already in the future -> * the dispatch callback falls back to fetching the internal message from internal_msg_queue, assuming it was that which caused the dispatch callback to be scheduled * this blocks indefinitely. The solution is very simple: use g_async_queue_try_pop() instead of g_async_queue_pop(), the dispatch code already takes care about a NULL message value. Thanks for the helpful reporters to hunt down the issue. Reported-By: Arkadiusz Miśkiewicz, Elan Ruusamäe A recent snapshot (and the git tree) should contain the fix. On Thu, 2009-11-12 at 11:10 -0500, Manassypov, Igor wrote:
Here is the information:
(gdb) info thread 2 process 153634 0xfed4b000 in _door_return () from /lib/libc.so.1 * 1 process 88098 0xfed46df0 in __lwp_park () from /lib/libc.so.1 (gdb) thread 1 [Switching to thread 1 (process 88098 )]#0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 (gdb) bt #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 #1 0xfed40c44 in cond_sleep_queue () from /lib/libc.so.1 #2 0xfed40e08 in cond_wait_queue () from /lib/libc.so.1 #3 0xfed41350 in cond_wait () from /lib/libc.so.1 #4 0xfed4138c in pthread_cond_wait () from /lib/libc.so.1 #5 0xff119d80 in g_async_queue_pop_intern_unlocked (queue=0x757e0, try=0, end_time=0x75618) at gasyncqueue.c:359 #6 0xff119e80 in g_async_queue_pop (queue=0x757e0) at gasyncqueue.c:398 #7 0x0003e984 in afinter_source_dispatch (source=0x8d260, callback=0x3e9dc <afinter_source_dispatch_msg>, user_data=0x8d1e0) at afinter.c:112 #8 0xff143564 in g_main_context_dispatch (context=0x8d158) at gmain.c:2144 #9 0xff1459a4 in g_main_context_iterate (context=0x8d158, block=1, dispatch=1, self=0x76030) at gmain.c:2778 #10 0xff146050 in g_main_context_iteration (context=0x8d158, may_block=1) at gmain.c:2841 #11 0x0001bc20 in main_loop_run (cfg=0xffbffbc8) at main.c:149 #12 0x0001c260 in main (argc=1, argv=0xffbffd44) at main.c:394 (gdb) thread 2 [Switching to thread 2 (process 153634 )]#0 0xfed4b000 in _door_return () from /lib/libc.so.1 (gdb) bt #0 0xfed4b000 in _door_return () from /lib/libc.so.1 #1 0xff370bdc in door_return () from /lib/libdoor.so.1 #2 0xff370c38 in door_create_func () from /lib/libdoor.so.1 #3 0xfed46d54 in _lwp_start () from /lib/libc.so.1 #4 0xfed46d54 in _lwp_start () from /lib/libc.so.1 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)
----------------------------------------------------------------------------------- Igor R. Manassypov, M.Eng., P.Eng., CCIE 23032, CCVP Network Architect, CI Investments 416.795.3147
-----Original Message----- From: Balazs Scheidler [mailto:bazsi@balabit.hu] Sent: Thursday, November 12, 2009 11:02 AM To: imanassypov@rogers.com Cc: Syslog-ng users' and developers' mailing list; Pallagi Zoltán; Network Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while
Hi,
not really, are there multiple threads in the same core file?
e.g. what is the response for "info threads"?
It would be nice to have the backtrace for all threads, like this:
(gdb) thread 1 (gdb) bt (gdb) thread 2 (gdb) bt
and so on, for each threadid that "info thread" lists.
On Fri, 2009-11-06 at 11:41 -0800, Igor Manassypov wrote:
Would this one make more sense?
bash-3.00# ps -eaf | grep syslog root 22562 22561 0 Nov 04 ? 0:30 /usr/local/sbin/syslog-ng root 22561 1 0 Nov 04 ? 0:00 /usr/local/sbin/syslog-ng
bash-3.00# truss -f -p 22562 22562/2: door_return(0x00000000, 0, 0x00000000, 0) (sleeping...) 22562/1: lwp_park(0x00000000, 0) (sleeping....) 22562/1: Received signal #11, SIGSEGV, in lwp_park() [default] 22562/1: siginfo: SIGSEGV pid=12717 uid=0 22562/1: lwp_park(0x00000000, 0) Err#4 EINTR
Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 88098 ] [New process 153634 ] #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1
bash-3.00# gdb syslog-ng core
Core was generated by `/usr/local/sbin/syslog-ng'. Program terminated with signal 11, Segmentation fault. [New process 88098 ] [New process 153634 ] #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 (gdb)
--- On Tue, 11/3/09, Balazs Scheidler <bazsi@balabit.hu> wrote:
From: Balazs Scheidler <bazsi@balabit..hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Cc: "Pallagi Zoltán" <pzolee@balabit.hu>, network@ci.com Date: Tuesday, November 3, 2009, 2:11 PM
Hi,
The problem is that you killed the supervisor process, which restarts syslog-ng in case it crashes.. However the hang is not in this part, but in its child.
So by looking at the ps output, I'd say that in this situation you should have trussed 13621 and not its parent.
On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov wrote: > Hi Zoltan, > > > Here are the traces: > > bash-3.00# ps -eaf | grep syslog > root 12694 12616 0 11:37:07 pts/1 0:00 grep syslog > root 13012 1 0 Oct 21 ? 0:00 syslog-ng -v > root 13013 13012 0 Oct 21 ? 0:41 syslog-ng -v > root 13620 1 0 Oct 08 ? > 0:00 /usr/local/sbin/syslog-ng > root 13621 13620 0 Oct 08 ? > 6:16 /usr/local/sbin/syslog-ng > bash-3.00# truss -f -p "13620" > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) > (sleeping...) > > 13620: Received signal #11, SIGSEGV, in waitid() [default] > 13620: siginfo: SIGSEGV pid=12717 uid=0 > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED) Err#4 EINTR > > Core was generated by `/usr/local/sbin/syslog-ng'. > Program terminated with signal 11, Segmentation fault. > [New process 79156 ] > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > (gdb) bt full > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > No symbol table info available. > #1 0xfecee038 in _waitpid () from /lib/libc.so.1 > No symbol table info available. > #2 0xfed3a70c in waitpid () from /lib/libc.so.1 > No symbol table info available. > #3 0x0003017c in g_process_start () at gprocess.c:1042 > rc = 0 > deadlock = 0 > pid = 13621 > __PRETTY_FUNCTION__ = "g_process_start" > #4 0x0001c214 in main (argc=1, argv=0xffbffd14) at main.c:371 > cfg = (GlobalConfig *) 0x10034 > rc = 310272 > ctx = (GOptionContext *) 0x76030 > error = (GError *) 0x0 > > Please let me know if I can provide you with more information, > > Thanks! > > --- On Tue, 11/3/09, Pallagi Zoltán <pzolee@balabit.hu> wrote: > > From: Pallagi Zoltán <pzolee@balabit.hu> > Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a > while > To: imanassypov@rogers.com, "Syslog-ng users' and developers' > mailing list" <syslog-ng@lists.balabit.hu> > Received: Tuesday, November 3, 2009, 11:10 AM > > Hi Igor, > > Can you show me truss output or backtrace of the stuck > syslog-ng?: > truss: > > truss -f -p "syslog-ng pid" > > backtrace: > > kill -11 "syslog-ng pid" (syslog-ng will drop a core file) > gdb syslog-ng core > bt full > > Igor Manassypov írta: > > Hello, > > > > > > I am having an issue with a solaris installation of the > > syslog-ng. It is configured such that all the logs are > > stored different per-ip folders. This is my centralized > > logging device, so it is fairly heavily loaded with > > receiving logs from a few dozen hosts. The syslog-ng process > > locks up every two to three weeks, with no messages logging > > to any of the files. The only way of getting it back is kill > > -9 the process and restart it. > > > > Is there any known issue of same sorts and is there any > > other way around it other than recycling the daemon every > > night? > > > > > > here is the version info: > > > > bash-3.00# syslog-ng --version > > syslog-ng 3.0.4 > > Revision: ssh > > +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 > > Compile-Date: Aug 11 2009 10:44:17 > > Enable-Threads: on > > Enable-Debug: off > > Enable-GProf: off > > Enable-Memtrace: off > > Enable-Sun-STREAMS: on > > Enable-Sun-Door: on > > Enable-IPv6: off > > Enable-Spoof-Source: on > > Enable-TCP-Wrapper: off > > Enable-SSL: on > > Enable-SQL: on > > Enable-Linux-Caps: off > > Enable-Pcre: on > > > > bash-3.00# uname -a > > SunOS prelude 5.10 Generic_137137-09 sun4v sparc SUNW,T5240 > > Thanks! > > > > -igor > > > > Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP Network > > Architect > > > > ____________________________________________________________ > > > > ______________________________________________________________________________ > > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > > FAQ: http://www.campin.net/syslog-ng/faq.html > > > > > > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > -- Bazsi
-- Bazsi
################################################################################### This communication is confidential and may be privileged. If you received it in error, please destroy without copying and advise the sender.
By submitting personal information to CI Investments, you agree to the collection, use and disclosure of such personal information for the purposes described in our Privacy Policy available at www.ci.com.
Cette communication est confidentielle et pourrait être privilégiée. Si vous la recevez par erreur, veuillez l'éliminer sans en faire une copie et aviser l'expéditeur.
Lorsque vous soumettez des renseignements personnels à Placements CI, vous nous permettez de conserver, utiliser et divulguer ces renseignements personnels aux fins décrites dans nos Principes directeurs en matière de protection des renseignements personnels qui sont disponibles au www.ci.com.
################################################################
-- Bazsi
participants (1)
-
Balazs Scheidler