Hi Balazs, Thanks for your prompt reply. Can you please direct me to the link where I can obtain the patch? Thanks! -igor Igor M., M.Eng, P.Eng Network Architect --- On Thu, 11/12/09, Balazs Scheidler <bazsi@balabit.hu> wrote: From: Balazs Scheidler <bazsi@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Cc: "Pallagi Zoltán" <pzolee@balabit.hu> Date: Thursday, November 12, 2009, 11:11 AM Hi, This seems to be the same issue as the one fixed by this patch: Author: Balazs Scheidler <bazsi@balabit.hu> 2009-08-30 11:41:24 Committer: Balazs Scheidler <bazsi@balabit.hu> 2009-08-30 11:41:24 Parent: 1ad4da07d5305ba0140ac385d661ab6de25fc5f3 ([patterndb] estring parser length calculation must include ending quote) Child: c2e8aa58763a89cab58d05fb7a2b2a18021413b4 ([logmsg] added support for ASA timestamps) Branches: master, remotes/balabit/master, remotes/origin/master Follows: v3.0.4 Precedes: [afinter] don't block on the internal_msg_queue even in the threaded case (fixes: pub#48) A hang was reported in bugzilla ticket #48 which seems to have been caused by MARK messages interfering with local messages: * if the MARK is due in the same poll iteration as a local message * the MARK timeout is checked and the internal source is marked as having input available * then the local message comes in pushing the mark timeout further ahead in time * then the internal() dispatch callback checks the mark timeout again, but at this time it is already in the future -> * the dispatch callback falls back to fetching the internal message from internal_msg_queue, assuming it was that which caused the dispatch callback to be scheduled * this blocks indefinitely. The solution is very simple: use g_async_queue_try_pop() instead of g_async_queue_pop(), the dispatch code already takes care about a NULL message value. On Tue, 2009-11-10 at 05:09 -0800, Igor Manassypov wrote:
(gdb) bt full #0 0xfed46df0 in __lwp_park () from /lib/libc.so.1 No symbol table info available. #1 0xfed40c44 in cond_sleep_queue () from /lib/libc.so.1 No symbol table info available. #2 0xfed40e08 in cond_wait_queue () from /lib/libc.so.1 No symbol table info available. #3 0xfed41350 in cond_wait () from /lib/libc.so.1 No symbol table info available. #4 0xfed4138c in pthread_cond_wait () from /lib/libc.so.1 No symbol table info available. #5 0xff119d80 in g_async_queue_pop_intern_unlocked (queue=0x757e0, try=0, end_time=0x75618) at gasyncqueue.c:359 retval = (gpointer) 0xa15b8 __PRETTY_FUNCTION__ = "g_async_queue_pop_intern_unlocked" #6 0xff119e80 in g_async_queue_pop (queue=0x757e0) at gasyncqueue.c:398 retval = (gpointer) 0x757e0 __PRETTY_FUNCTION__ = "g_async_queue_pop" #7 0x0003e984 in afinter_source_dispatch (source=0x8d260, callback=0x3e9dc <afinter_source_dispatch_msg>, user_data=0x8d1e0)
at afinter.c:112 msg = (LogMessage *) 0xa0dc0 path_options = {flow_control = -1, matched = 0x0} tv = {tv_sec = 1257363112, tv_usec = 441817} #8 0xff143564 in g_main_context_dispatch (context=0x8d158) at gmain.c:2144 No locals. #9 0xff1459a4 in g_main_context_iterate (context=0x8d158, block=1, dispatch=1, self=0x76030) at gmain.c:2778 max_priority = 2147483647 timeout = 4000 some_ready = 1 nfds = 4 allocated_nfds = 1 fds = (GPollFD *) 0x788c8 __PRETTY_FUNCTION__ = "g_main_context_iterate" #10 0xff146050 in g_main_context_iteration (context=0x8d158, may_block=1) at gmain.c:2841 retval = 1 #11 0x0001bc20 in main_loop_run (cfg=0xffbffbc8) at main.c:149 iters = 0 stats_timer_id = 0 #12 0x0001c260 in main (argc=1, argv=0xffbffd44) at main.c:394 cfg = (GlobalConfig *) 0x794d0 rc = 0 ctx = (GOptionContext *) 0x76030 error = (GError *) 0x0
Igor M., M.Eng, P.Eng Network Architect
--- On Mon, 11/9/09, Pallagi Zoltán <pzolee@balabit.hu> wrote: From: Pallagi Zoltán <pzolee@balabit.hu> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while To: imanassypov@rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng@lists.balabit.hu> Date: Monday, November 9, 2009, 11:35 AM Igor Manassypov írta: > Would this one make more sense? > > > > bash-3.00# ps -eaf | grep syslog > root 22562 22561 0 Nov 04 ? > 0:30 /usr/local/sbin/syslog-ng > root 22561 1 0 Nov 04 ? > 0:00 /usr/local/sbin/syslog-ng > > bash-3.00# truss -f -p 22562 > 22562/2: door_return(0x00000000, 0, 0x00000000, 0) > (sleeping...) > 22562/1: lwp_park(0x00000000, 0) > (sleeping....) > 22562/1: Received signal #11, SIGSEGV, in > lwp_park() [default] > 22562/1: siginfo: SIGSEGV pid=12717 uid=0 > 22562/1: lwp_park(0x00000000, 0) > Err#4 EINTR > > Core was generated by `/usr/local/sbin/syslog-ng'. > Program terminated with signal 11, Segmentation fault. > [New process 88098 ] > [New process 153634 ] > #0 0xfed46df0 in __lwp_park () from /lib/libc.so..1 > #0 0xfed46df0 in __lwp_park () from /lib/libc.so..1 > > bash-3.00# gdb syslog-ng core > > Core was generated by `/usr/local/sbin/syslog-ng'. > Program terminated with signal 11, Segmentation fault. > [New process 88098 ] > [New process 153634 ] > #0 0xfed46df0 in __lwp_park () from /lib/libc.so..1 > (gdb) Please show us output of "bt full" too > > > --- On Tue, 11/3/09, Balazs Scheidler <bazsi@balabit.hu> > wrote: > > From: Balazs Scheidler <bazsi@balabit..hu> > Subject: Re: [syslog-ng] syslog-ng on solaris locks > up after a while > To: imanassypov@rogers.com, "Syslog-ng users' and > developers' mailing list" > <syslog-ng@lists.balabit.hu> > Cc: "Pallagi Zoltán" <pzolee@balabit.hu>, > network@ci.com > Date: Tuesday, November 3, 2009, 2:11 PM > > Hi, > > The problem is that you killed the supervisor > process, which restarts > syslog-ng in case it crashes.. However the hang is > not in this part, but > in its child. > > So by looking at the ps output, I'd say that in this > situation you > should have trussed 13621 and not its parent. > > On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov > wrote: > > Hi Zoltan, > > > > > > Here are the traces: > > > > bash-3.00# ps -eaf | grep syslog > > root 12694 12616 0 11:37:07 pts/1 0:00 > grep syslog > > root 13012 1 0 Oct 21 ? 0:00 > syslog-ng -v > > root 13013 13012 0 Oct 21 ? 0:41 > syslog-ng -v > > root 13620 1 0 Oct 08 ? > > 0:00 /usr/local/sbin/syslog-ng > > root 13621 13620 0 Oct 08 ? > > 6:16 /usr/local/sbin/syslog-ng > > bash-3.00# truss -f -p "13620" > > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED| > WTRAPPED) > > (sleeping...) > > > > 13620: Received signal #11, SIGSEGV, in > waitid() [default] > > 13620: siginfo: SIGSEGV pid=12717 uid=0 > > 13620: waitid(P_PID, 13621, 0xFFBFF468, WEXITED| > WTRAPPED) Err#4 EINTR > > > > Core was generated by `/usr/local/sbin/syslog-ng'. > > Program terminated with signal 11, Segmentation > fault. > > [New process 79156 ] > > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > > (gdb) bt full > > #0 0xfed4ad80 in _waitid () from /lib/libc.so.1 > > No symbol table info available. > > #1 0xfecee038 in _waitpid () from /lib/libc.so.1 > > No symbol table info available. > > #2 0xfed3a70c in waitpid () from /lib/libc.so.1 > > No symbol table info available. > > #3 0x0003017c in g_process_start () at > gprocess.c:1042 > > rc = 0 > > deadlock = 0 > > pid = 13621 > > __PRETTY_FUNCTION__ = "g_process_start" > > #4 0x0001c214 in main (argc=1, argv=0xffbffd14) > at main.c:371 > > cfg = (GlobalConfig *) 0x10034 > > rc = 310272 > > ctx = (GOptionContext *) 0x76030 > > error = (GError *) 0x0 > > > > Please let me know if I can provide you with more > information, > > > > Thanks! > > > > --- On Tue, 11/3/09, Pallagi Zoltán > <pzolee@balabit.hu> wrote: > > > > From: Pallagi Zoltán <pzolee@balabit.hu> > > Subject: Re: [syslog-ng] syslog-ng on > solaris locks up after a > > while > > To: imanassypov@rogers.com, "Syslog-ng > users' and developers' > > mailing list" <syslog-ng@lists.balabit.hu> > > Received: Tuesday, November 3, 2009, 11:10 > AM > > > > Hi Igor, > > > > Can you show me truss output or backtrace > of the stuck > > syslog-ng?: > > truss: > > > > truss -f -p "syslog-ng pid" > > > > backtrace: > > > > kill -11 "syslog-ng pid" (syslog-ng will > drop a core file) > > gdb syslog-ng core > > bt full > > > > Igor Manassypov írta: > > > Hello, > > > > > > > > > I am having an issue with a solaris > installation of the > > > syslog-ng. It is configured such that > all the logs are > > > stored different per-ip folders. This is > my centralized > > > logging device, so it is fairly heavily > loaded with > > > receiving logs from a few dozen hosts. > The syslog-ng process > > > locks up every two to three weeks, with > no messages logging > > > to any of the files. The only way of > getting it back is kill > > > -9 the process and restart it. > > > > > > Is there any known issue of same sorts > and is there any > > > other way around it other than recycling > the daemon every > > > night? > > > > > > > > > here is the version info: > > > > > > bash-3.00# syslog-ng --version > > > syslog-ng 3.0.4 > > > Revision: ssh > > > > +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10 > > > Compile-Date: Aug 11 2009 10:44:17 > > > Enable-Threads: on > > > Enable-Debug: off > > > Enable-GProf: off > > > Enable-Memtrace: off > > > Enable-Sun-STREAMS: on > > > Enable-Sun-Door: on > > > Enable-IPv6: off > > > Enable-Spoof-Source: on > > > Enable-TCP-Wrapper: off > > > Enable-SSL: on > > > Enable-SQL: on > > > Enable-Linux-Caps: off > > > Enable-Pcre: on > > > > > > bash-3.00# uname -a > > > SunOS prelude 5.10 Generic_137137-09 > sun4v sparc SUNW,T5240 > > > Thanks! > > > > > > -igor > > > > > > Igor Manassypov., M.Eng, P.Eng, CCIE > 23032, CCVP Network > > > Architect > > > > > > > ____________________________________________________________ > > > > > > > ______________________________________________________________________________ > > > Member info: > https://lists.balabit.hu/mailman/listinfo/syslog-ng > > > Documentation: > http://www.balabit.com/support/documentation/?product=syslog-ng > > > FAQ: > http://www.campin.net/syslog-ng/faq.html > > > > > > > > > > > > > ______________________________________________________________________________ > > Member info: > https://lists.balabit.hu/mailman/listinfo/syslog-ng > > Documentation: > http://www.balabit.com/support/documentation/?product=syslog-ng > > FAQ: http://www.campin.net/syslog-ng/faq..html > > > -- > Bazsi > > > > > ____________________________________________________________ > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > > ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
-- Bazsi