[syslog-ng] syslog-ng on solaris locks up after a while

Igor Manassypov imanassypov at rogers.com
Thu Nov 12 17:28:56 CET 2009


Hi Balazs,


Thanks for your prompt reply. Can you please direct me to the link where I can obtain the patch?

Thanks!

-igor

Igor M., M.Eng, P.Eng Network Architect

--- On Thu, 11/12/09, Balazs Scheidler <bazsi at balabit.hu> wrote:

From: Balazs Scheidler <bazsi at balabit.hu>
Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while
To: imanassypov at rogers.com, "Syslog-ng users' and developers' mailing list" <syslog-ng at lists.balabit.hu>
Cc: "Pallagi Zoltán" <pzolee at balabit.hu>
Date: Thursday, November 12, 2009, 11:11 AM

Hi,

This seems to be the same issue as the one fixed by this patch:

Author: Balazs Scheidler <bazsi at balabit.hu>  2009-08-30 11:41:24
Committer: Balazs Scheidler <bazsi at balabit.hu>  2009-08-30 11:41:24
Parent: 1ad4da07d5305ba0140ac385d661ab6de25fc5f3 ([patterndb] estring parser length calculation must include ending quote)
Child:  c2e8aa58763a89cab58d05fb7a2b2a18021413b4 ([logmsg] added support for ASA timestamps)
Branches: master, remotes/balabit/master, remotes/origin/master
Follows: v3.0.4
Precedes: 

    [afinter] don't block on the internal_msg_queue even in the threaded case (fixes: pub#48)
    
    A hang was reported in bugzilla ticket #48 which seems to have
    been caused by MARK messages interfering with local messages:
    
      * if the MARK is due in the same poll iteration as a local message
      * the MARK timeout is checked and the internal source is marked as having
        input available
      * then the local message comes in pushing the mark timeout further ahead
        in time
      * then the internal() dispatch callback checks the mark timeout again,
        but at this time it is already in the future ->
      * the dispatch callback falls back to fetching the internal message from
        internal_msg_queue, assuming it was that which caused the dispatch
        callback to be scheduled
      * this blocks indefinitely.
    
    The solution is very simple: use g_async_queue_try_pop() instead of
    g_async_queue_pop(), the dispatch code already takes care about a
    NULL message value.


On Tue, 2009-11-10 at 05:09 -0800, Igor Manassypov wrote:
> (gdb) bt full 
> #0  0xfed46df0 in __lwp_park () from /lib/libc.so.1 
> No symbol table info available. 
> #1  0xfed40c44 in cond_sleep_queue () from /lib/libc.so.1 
> No symbol table info available. 
> #2  0xfed40e08 in cond_wait_queue () from /lib/libc.so.1 
> No symbol table info available. 
> #3  0xfed41350 in cond_wait () from /lib/libc.so.1 
> No symbol table info available. 
> #4  0xfed4138c in pthread_cond_wait () from /lib/libc.so.1 
> No symbol table info available. 
> #5  0xff119d80 in g_async_queue_pop_intern_unlocked (queue=0x757e0,
> try=0, end_time=0x75618) at gasyncqueue.c:359 
>         retval = (gpointer) 0xa15b8 
>         __PRETTY_FUNCTION__ = "g_async_queue_pop_intern_unlocked" 
> #6  0xff119e80 in g_async_queue_pop (queue=0x757e0) at
> gasyncqueue.c:398 
>         retval = (gpointer) 0x757e0 
>         __PRETTY_FUNCTION__ = "g_async_queue_pop" 
> #7  0x0003e984 in afinter_source_dispatch (source=0x8d260,
> callback=0x3e9dc <afinter_source_dispatch_msg>, user_data=0x8d1e0)
> 
>     at afinter.c:112 
>         msg = (LogMessage *) 0xa0dc0 
>         path_options = {flow_control = -1, matched = 0x0} 
>         tv = {tv_sec = 1257363112, tv_usec = 441817} 
> #8  0xff143564 in g_main_context_dispatch (context=0x8d158) at
> gmain.c:2144 
> No locals. 
> #9  0xff1459a4 in g_main_context_iterate (context=0x8d158, block=1,
> dispatch=1, self=0x76030) at gmain.c:2778 
>         max_priority = 2147483647 
>         timeout = 4000 
>         some_ready = 1 
>         nfds = 4 
>         allocated_nfds = 1 
>         fds = (GPollFD *) 0x788c8 
>         __PRETTY_FUNCTION__ = "g_main_context_iterate" 
> #10 0xff146050 in g_main_context_iteration (context=0x8d158,
> may_block=1) at gmain.c:2841 
>         retval = 1 
> #11 0x0001bc20 in main_loop_run (cfg=0xffbffbc8) at main.c:149 
>         iters = 0 
>         stats_timer_id = 0 
> #12 0x0001c260 in main (argc=1, argv=0xffbffd44) at main.c:394 
>         cfg = (GlobalConfig *) 0x794d0 
>         rc = 0 
>         ctx = (GOptionContext *) 0x76030 
>         error = (GError *) 0x0
> 
> 
> 
> Igor M., M.Eng, P.Eng Network Architect
> 
> --- On Mon, 11/9/09, Pallagi Zoltán <pzolee at balabit.hu> wrote:
>         
>         From: Pallagi Zoltán <pzolee at balabit.hu>
>         Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a
>         while
>         To: imanassypov at rogers.com, "Syslog-ng users' and developers'
>         mailing list" <syslog-ng at lists.balabit.hu>
>         Date: Monday, November 9, 2009, 11:35 AM
>         
>         Igor Manassypov írta: 
>         > Would this one make more sense?
>         > 
>         > 
>         > 
>         > bash-3.00# ps -eaf | grep syslog 
>         >     root 22562 22561   0   Nov 04 ?
>         > 0:30 /usr/local/sbin/syslog-ng 
>         >     root 22561     1   0   Nov 04 ?
>         > 0:00 /usr/local/sbin/syslog-ng 
>         > 
>         > bash-3.00# truss -f -p 22562 
>         > 22562/2:        door_return(0x00000000, 0, 0x00000000, 0)
>         > (sleeping...) 
>         > 22562/1:        lwp_park(0x00000000, 0)
>         > (sleeping....) 
>         > 22562/1:            Received signal #11, SIGSEGV, in
>         > lwp_park() [default] 
>         > 22562/1:              siginfo: SIGSEGV pid=12717 uid=0 
>         > 22562/1:        lwp_park(0x00000000, 0)
>         > Err#4 EINTR 
>         > 
>         > Core was generated by `/usr/local/sbin/syslog-ng'. 
>         > Program terminated with signal 11, Segmentation fault. 
>         > [New process 88098    ] 
>         > [New process 153634    ] 
>         > #0  0xfed46df0 in __lwp_park () from /lib/libc.so..1 
>         > #0  0xfed46df0 in __lwp_park () from /lib/libc.so..1 
>         > 
>         > bash-3.00# gdb syslog-ng core 
>         > 
>         > Core was generated by `/usr/local/sbin/syslog-ng'. 
>         > Program terminated with signal 11, Segmentation fault. 
>         > [New process 88098    ] 
>         > [New process 153634    ] 
>         > #0  0xfed46df0 in __lwp_park () from /lib/libc.so..1 
>         > (gdb) 
>         Please show us output of "bt full" too
>         > 
>         > 
>         > --- On Tue, 11/3/09, Balazs Scheidler <bazsi at balabit.hu>
>         > wrote:
>         >         
>         >         From: Balazs Scheidler <bazsi at balabit..hu>
>         >         Subject: Re: [syslog-ng] syslog-ng on solaris locks
>         >         up after a while
>         >         To: imanassypov at rogers.com, "Syslog-ng users' and
>         >         developers' mailing list"
>         >         <syslog-ng at lists.balabit.hu>
>         >         Cc: "Pallagi Zoltán" <pzolee at balabit.hu>,
>         >         network at ci.com
>         >         Date: Tuesday, November 3, 2009, 2:11 PM
>         >         
>         >         Hi,
>         >         
>         >         The problem is that you killed the supervisor
>         >         process, which restarts
>         >         syslog-ng in case it crashes.. However the hang is
>         >         not in this part, but
>         >         in its child.
>         >         
>         >         So by looking at the ps output, I'd say that in this
>         >         situation you
>         >         should have trussed 13621 and not its parent.
>         >         
>         >         On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov
>         >         wrote:
>         >         > Hi Zoltan,
>         >         > 
>         >         > 
>         >         > Here are the traces:
>         >         > 
>         >         > bash-3.00# ps -eaf | grep syslog
>         >         >     root 12694 12616   0 11:37:07 pts/1       0:00
>         >         grep syslog
>         >         >     root 13012     1   0   Oct 21 ?           0:00
>         >         syslog-ng -v
>         >         >     root 13013 13012   0   Oct 21 ?           0:41
>         >         syslog-ng -v
>         >         >     root 13620     1   0   Oct 08 ?
>         >         > 0:00 /usr/local/sbin/syslog-ng
>         >         >     root 13621 13620   0   Oct 08 ?
>         >         > 6:16 /usr/local/sbin/syslog-ng
>         >         > bash-3.00# truss -f -p "13620"
>         >         > 13620:  waitid(P_PID, 13621, 0xFFBFF468, WEXITED|
>         >         WTRAPPED)
>         >         > (sleeping...)
>         >         > 
>         >         > 13620:      Received signal #11, SIGSEGV, in
>         >         waitid() [default]
>         >         > 13620:        siginfo: SIGSEGV pid=12717 uid=0
>         >         > 13620:  waitid(P_PID, 13621, 0xFFBFF468, WEXITED|
>         >         WTRAPPED) Err#4 EINTR
>         >         > 
>         >         > Core was generated by `/usr/local/sbin/syslog-ng'.
>         >         > Program terminated with signal 11, Segmentation
>         >         fault.
>         >         > [New process 79156    ]
>         >         > #0  0xfed4ad80 in _waitid () from /lib/libc.so.1
>         >         > (gdb) bt full
>         >         > #0  0xfed4ad80 in _waitid () from /lib/libc.so.1
>         >         > No symbol table info available.
>         >         > #1  0xfecee038 in _waitpid () from /lib/libc.so.1
>         >         > No symbol table info available.
>         >         > #2  0xfed3a70c in waitpid () from /lib/libc.so.1
>         >         > No symbol table info available.
>         >         > #3  0x0003017c in g_process_start () at
>         >         gprocess.c:1042
>         >         >         rc = 0
>         >         >         deadlock = 0
>         >         >         pid = 13621
>         >         >         __PRETTY_FUNCTION__ = "g_process_start"
>         >         > #4  0x0001c214 in main (argc=1, argv=0xffbffd14)
>         >         at main.c:371
>         >         >         cfg = (GlobalConfig *) 0x10034
>         >         >         rc = 310272
>         >         >         ctx = (GOptionContext *) 0x76030
>         >         >         error = (GError *) 0x0
>         >         > 
>         >         > Please let me know if I can provide you with more
>         >         information,
>         >         > 
>         >         > Thanks!
>         >         > 
>         >         > --- On Tue, 11/3/09, Pallagi Zoltán
>         >         <pzolee at balabit.hu> wrote:
>         >         >         
>         >         >         From: Pallagi Zoltán <pzolee at balabit.hu>
>         >         >         Subject: Re: [syslog-ng] syslog-ng on
>         >         solaris locks up after a
>         >         >         while
>         >         >         To: imanassypov at rogers.com, "Syslog-ng
>         >         users' and developers'
>         >         >         mailing list" <syslog-ng at lists.balabit.hu>
>         >         >         Received: Tuesday, November 3, 2009, 11:10
>         >         AM
>         >         >         
>         >         >         Hi Igor,
>         >         >         
>         >         >         Can you show me truss output or backtrace
>         >         of the stuck
>         >         >         syslog-ng?:
>         >         >         truss:
>         >         >         
>         >         >         truss -f -p "syslog-ng pid"
>         >         >         
>         >         >         backtrace:
>         >         >         
>         >         >         kill -11 "syslog-ng pid" (syslog-ng will
>         >         drop a core file)
>         >         >         gdb syslog-ng core
>         >         >         bt full
>         >         >         
>         >         >         Igor Manassypov írta: 
>         >         >         > Hello,
>         >         >         > 
>         >         >         > 
>         >         >         > I am having an issue with a solaris
>         >         installation of the
>         >         >         > syslog-ng. It is configured such that
>         >         all the logs are
>         >         >         > stored different per-ip folders. This is
>         >         my centralized
>         >         >         > logging device, so it is fairly heavily
>         >         loaded with
>         >         >         > receiving logs from a few dozen hosts.
>         >         The syslog-ng process
>         >         >         > locks up every two to three weeks, with
>         >         no messages logging
>         >         >         > to any of the files. The only way of
>         >         getting it back is kill
>         >         >         > -9 the process and restart it.
>         >         >         > 
>         >         >         > Is there any known issue of same sorts
>         >         and is there any
>         >         >         > other way around it other than recycling
>         >         the daemon every
>         >         >         > night?
>         >         >         > 
>         >         >         > 
>         >         >         > here is the version info:
>         >         >         > 
>         >         >         > bash-3.00# syslog-ng --version
>         >         >         > syslog-ng 3.0.4
>         >         >         > Revision: ssh
>         >         >         >
>         >         +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10
>         >         >         > Compile-Date: Aug 11 2009 10:44:17
>         >         >         > Enable-Threads: on
>         >         >         > Enable-Debug: off
>         >         >         > Enable-GProf: off
>         >         >         > Enable-Memtrace: off
>         >         >         > Enable-Sun-STREAMS: on
>         >         >         > Enable-Sun-Door: on
>         >         >         > Enable-IPv6: off
>         >         >         > Enable-Spoof-Source: on
>         >         >         > Enable-TCP-Wrapper: off
>         >         >         > Enable-SSL: on
>         >         >         > Enable-SQL: on
>         >         >         > Enable-Linux-Caps: off
>         >         >         > Enable-Pcre: on
>         >         >         > 
>         >         >         > bash-3.00# uname -a
>         >         >         > SunOS prelude 5.10 Generic_137137-09
>         >         sun4v sparc SUNW,T5240
>         >         >         > Thanks!
>         >         >         > 
>         >         >         > -igor
>         >         >         > 
>         >         >         > Igor Manassypov., M.Eng, P.Eng, CCIE
>         >         23032, CCVP Network
>         >         >         > Architect
>         >         >         > 
>         >         >         >
>         >         ____________________________________________________________
>         >         >         > 
>         >         >         >
>         >         ______________________________________________________________________________
>         >         >         > Member info:
>         >         https://lists.balabit.hu/mailman/listinfo/syslog-ng
>         >         >         > Documentation:
>         >         http://www.balabit.com/support/documentation/?product=syslog-ng
>         >         >         > FAQ:
>         >         http://www.campin.net/syslog-ng/faq.html
>         >         >         > 
>         >         >         >   
>         >         >         
>         >         >         
>         >         >
>         >         ______________________________________________________________________________
>         >         > Member info:
>         >         https://lists.balabit.hu/mailman/listinfo/syslog-ng
>         >         > Documentation:
>         >         http://www.balabit.com/support/documentation/?product=syslog-ng
>         >         > FAQ: http://www.campin.net/syslog-ng/faq..html
>         >         > 
>         >         -- 
>         >         Bazsi
>         >         
>         >         
>         >         
>         > 
>         > ____________________________________________________________
>         > 
>         > ______________________________________________________________________________
>         > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>         > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>         > FAQ: http://www.campin.net/syslog-ng/faq.html
>         > 
>         >   
>         
>         
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
> 
-- 
Bazsi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20091112/e6bb7d4d/attachment-0001.htm 


More information about the syslog-ng mailing list