[syslog-ng] syslog-ng on solaris locks up after a while

Balazs Scheidler bazsi at balabit.hu
Wed Nov 18 12:36:08 CET 2009


Thanks. It indeeed seems to be the same as

Author: Balazs Scheidler <bazsi at balabit.hu>  2009-08-30 11:41:24
Committer: Balazs Scheidler <bazsi at balabit.hu>  2009-08-30 11:41:24
Parent: 1ad4da07d5305ba0140ac385d661ab6de25fc5f3 ([patterndb] estring parser length calculation must include ending quote)
Child:  c2e8aa58763a89cab58d05fb7a2b2a18021413b4 ([logmsg] added support for ASA timestamps)
Branches: master, remotes/balabit/master, remotes/origin/master
Follows: v3.0.4
Precedes: 

    [afinter] don't block on the internal_msg_queue even in the threaded case (fixes: pub#48)
    
    A hang was reported in bugzilla ticket #48 which seems to have
    been caused by MARK messages interfering with local messages:
    
      * if the MARK is due in the same poll iteration as a local message
      * the MARK timeout is checked and the internal source is marked as having
        input available
      * then the local message comes in pushing the mark timeout further ahead
        in time
      * then the internal() dispatch callback checks the mark timeout again,
        but at this time it is already in the future ->
      * the dispatch callback falls back to fetching the internal message from
        internal_msg_queue, assuming it was that which caused the dispatch
        callback to be scheduled
      * this blocks indefinitely.
    
    The solution is very simple: use g_async_queue_try_pop() instead of
    g_async_queue_pop(), the dispatch code already takes care about a
    NULL message value.
    
    Thanks for the helpful reporters to hunt down the issue.
    
    Reported-By: Arkadiusz Miśkiewicz, Elan Ruusamäe


A recent snapshot (and the git tree) should contain the fix.

On Thu, 2009-11-12 at 11:10 -0500, Manassypov, Igor wrote:
> Here is the information:
> 
> (gdb) info thread
>   2 process 153634      0xfed4b000 in _door_return () from /lib/libc.so.1
> * 1 process 88098      0xfed46df0 in __lwp_park () from /lib/libc.so.1
> (gdb) thread 1  
> [Switching to thread 1 (process 88098    )]#0  0xfed46df0 in __lwp_park () from /lib/libc.so.1
> (gdb) bt
> #0  0xfed46df0 in __lwp_park () from /lib/libc.so.1
> #1  0xfed40c44 in cond_sleep_queue () from /lib/libc.so.1
> #2  0xfed40e08 in cond_wait_queue () from /lib/libc.so.1
> #3  0xfed41350 in cond_wait () from /lib/libc.so.1
> #4  0xfed4138c in pthread_cond_wait () from /lib/libc.so.1
> #5  0xff119d80 in g_async_queue_pop_intern_unlocked (queue=0x757e0, try=0, end_time=0x75618) at gasyncqueue.c:359
> #6  0xff119e80 in g_async_queue_pop (queue=0x757e0) at gasyncqueue.c:398
> #7  0x0003e984 in afinter_source_dispatch (source=0x8d260, callback=0x3e9dc <afinter_source_dispatch_msg>, user_data=0x8d1e0)
>     at afinter.c:112
> #8  0xff143564 in g_main_context_dispatch (context=0x8d158) at gmain.c:2144
> #9  0xff1459a4 in g_main_context_iterate (context=0x8d158, block=1, dispatch=1, self=0x76030) at gmain.c:2778
> #10 0xff146050 in g_main_context_iteration (context=0x8d158, may_block=1) at gmain.c:2841
> #11 0x0001bc20 in main_loop_run (cfg=0xffbffbc8) at main.c:149
> #12 0x0001c260 in main (argc=1, argv=0xffbffd44) at main.c:394
> (gdb) thread 2
> [Switching to thread 2 (process 153634    )]#0  0xfed4b000 in _door_return () from /lib/libc.so.1
> (gdb) bt
> #0  0xfed4b000 in _door_return () from /lib/libc.so.1
> #1  0xff370bdc in door_return () from /lib/libdoor.so.1
> #2  0xff370c38 in door_create_func () from /lib/libdoor.so.1
> #3  0xfed46d54 in _lwp_start () from /lib/libc.so.1
> #4  0xfed46d54 in _lwp_start () from /lib/libc.so.1
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb) 
> 
>  
> 
> 
> -----------------------------------------------------------------------------------
> Igor R. Manassypov, M.Eng., P.Eng., CCIE 23032, CCVP
> Network Architect, CI Investments
> 416.795.3147
> 
> -----Original Message-----
> From: Balazs Scheidler [mailto:bazsi at balabit.hu] 
> Sent: Thursday, November 12, 2009 11:02 AM
> To: imanassypov at rogers.com
> Cc: Syslog-ng users' and developers' mailing list; Pallagi Zoltán; Network
> Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a while
> 
> Hi,
> 
> not really, are there multiple threads in the same core file?
> 
> e.g. what is the response for "info threads"?
> 
> It would be nice to have the backtrace for all threads, like this:
> 
> (gdb) thread 1
> (gdb) bt
> (gdb) thread 2
> (gdb) bt
> 
> and so on, for each threadid that "info thread" lists.
> 
> 
> On Fri, 2009-11-06 at 11:41 -0800, Igor Manassypov wrote:
> > Would this one make more sense?
> > 
> > 
> > 
> > bash-3.00# ps -eaf | grep syslog 
> >     root 22562 22561   0   Nov 04 ?
> > 0:30 /usr/local/sbin/syslog-ng 
> >     root 22561     1   0   Nov 04 ?
> > 0:00 /usr/local/sbin/syslog-ng
> > 
> > bash-3.00# truss -f -p 22562 
> > 22562/2:        door_return(0x00000000, 0, 0x00000000, 0)
> > (sleeping...) 
> > 22562/1:        lwp_park(0x00000000, 0)         (sleeping....) 
> > 22562/1:            Received signal #11, SIGSEGV, in lwp_park()
> > [default] 
> > 22562/1:              siginfo: SIGSEGV pid=12717 uid=0 
> > 22562/1:        lwp_park(0x00000000, 0)                         Err#4
> > EINTR
> > 
> > Core was generated by `/usr/local/sbin/syslog-ng'. 
> > Program terminated with signal 11, Segmentation fault. 
> > [New process 88098    ] 
> > [New process 153634    ] 
> > #0  0xfed46df0 in __lwp_park () from /lib/libc.so.1 #0  0xfed46df0 in 
> > __lwp_park () from /lib/libc.so.1
> > 
> > bash-3.00# gdb syslog-ng core
> > 
> > Core was generated by `/usr/local/sbin/syslog-ng'. 
> > Program terminated with signal 11, Segmentation fault. 
> > [New process 88098    ] 
> > [New process 153634    ] 
> > #0  0xfed46df0 in __lwp_park () from /lib/libc.so.1
> > (gdb)
> > 
> > --- On Tue, 11/3/09, Balazs Scheidler <bazsi at balabit.hu> wrote:
> >         
> >         From: Balazs Scheidler <bazsi at balabit..hu>
> >         Subject: Re: [syslog-ng] syslog-ng on solaris locks up after a
> >         while
> >         To: imanassypov at rogers.com, "Syslog-ng users' and developers'
> >         mailing list" <syslog-ng at lists.balabit.hu>
> >         Cc: "Pallagi Zoltán" <pzolee at balabit.hu>, network at ci.com
> >         Date: Tuesday, November 3, 2009, 2:11 PM
> >         
> >         Hi,
> >         
> >         The problem is that you killed the supervisor process, which
> >         restarts
> >         syslog-ng in case it crashes.. However the hang is not in this
> >         part, but
> >         in its child.
> >         
> >         So by looking at the ps output, I'd say that in this situation
> >         you
> >         should have trussed 13621 and not its parent.
> >         
> >         On Tue, 2009-11-03 at 08:54 -0800, Igor Manassypov wrote:
> >         > Hi Zoltan,
> >         > 
> >         > 
> >         > Here are the traces:
> >         > 
> >         > bash-3.00# ps -eaf | grep syslog
> >         >     root 12694 12616   0 11:37:07 pts/1       0:00 grep
> >         syslog
> >         >     root 13012     1   0   Oct 21 ?           0:00 syslog-ng
> >         -v
> >         >     root 13013 13012   0   Oct 21 ?           0:41 syslog-ng
> >         -v
> >         >     root 13620     1   0   Oct 08 ?
> >         > 0:00 /usr/local/sbin/syslog-ng
> >         >     root 13621 13620   0   Oct 08 ?
> >         > 6:16 /usr/local/sbin/syslog-ng
> >         > bash-3.00# truss -f -p "13620"
> >         > 13620:  waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED)
> >         > (sleeping...)
> >         > 
> >         > 13620:      Received signal #11, SIGSEGV, in waitid()
> >         [default]
> >         > 13620:        siginfo: SIGSEGV pid=12717 uid=0
> >         > 13620:  waitid(P_PID, 13621, 0xFFBFF468, WEXITED|WTRAPPED)
> >         Err#4 EINTR
> >         > 
> >         > Core was generated by `/usr/local/sbin/syslog-ng'.
> >         > Program terminated with signal 11, Segmentation fault.
> >         > [New process 79156    ]
> >         > #0  0xfed4ad80 in _waitid () from /lib/libc.so.1
> >         > (gdb) bt full
> >         > #0  0xfed4ad80 in _waitid () from /lib/libc.so.1
> >         > No symbol table info available.
> >         > #1  0xfecee038 in _waitpid () from /lib/libc.so.1
> >         > No symbol table info available.
> >         > #2  0xfed3a70c in waitpid () from /lib/libc.so.1
> >         > No symbol table info available.
> >         > #3  0x0003017c in g_process_start () at gprocess.c:1042
> >         >         rc = 0
> >         >         deadlock = 0
> >         >         pid = 13621
> >         >         __PRETTY_FUNCTION__ = "g_process_start"
> >         > #4  0x0001c214 in main (argc=1, argv=0xffbffd14) at
> >         main.c:371
> >         >         cfg = (GlobalConfig *) 0x10034
> >         >         rc = 310272
> >         >         ctx = (GOptionContext *) 0x76030
> >         >         error = (GError *) 0x0
> >         > 
> >         > Please let me know if I can provide you with more
> >         information,
> >         > 
> >         > Thanks!
> >         > 
> >         > --- On Tue, 11/3/09, Pallagi Zoltán <pzolee at balabit.hu>
> >         wrote:
> >         >         
> >         >         From: Pallagi Zoltán <pzolee at balabit.hu>
> >         >         Subject: Re: [syslog-ng] syslog-ng on solaris locks
> >         up after a
> >         >         while
> >         >         To: imanassypov at rogers.com, "Syslog-ng users' and
> >         developers'
> >         >         mailing list" <syslog-ng at lists.balabit.hu>
> >         >         Received: Tuesday, November 3, 2009, 11:10 AM
> >         >         
> >         >         Hi Igor,
> >         >         
> >         >         Can you show me truss output or backtrace of the
> >         stuck
> >         >         syslog-ng?:
> >         >         truss:
> >         >         
> >         >         truss -f -p "syslog-ng pid"
> >         >         
> >         >         backtrace:
> >         >         
> >         >         kill -11 "syslog-ng pid" (syslog-ng will drop a core
> >         file)
> >         >         gdb syslog-ng core
> >         >         bt full
> >         >         
> >         >         Igor Manassypov írta: 
> >         >         > Hello,
> >         >         > 
> >         >         > 
> >         >         > I am having an issue with a solaris installation
> >         of the
> >         >         > syslog-ng. It is configured such that all the logs
> >         are
> >         >         > stored different per-ip folders. This is my
> >         centralized
> >         >         > logging device, so it is fairly heavily loaded
> >         with
> >         >         > receiving logs from a few dozen hosts. The
> >         syslog-ng process
> >         >         > locks up every two to three weeks, with no
> >         messages logging
> >         >         > to any of the files. The only way of getting it
> >         back is kill
> >         >         > -9 the process and restart it.
> >         >         > 
> >         >         > Is there any known issue of same sorts and is
> >         there any
> >         >         > other way around it other than recycling the
> >         daemon every
> >         >         > night?
> >         >         > 
> >         >         > 
> >         >         > here is the version info:
> >         >         > 
> >         >         > bash-3.00# syslog-ng --version
> >         >         > syslog-ng 3.0.4
> >         >         > Revision: ssh
> >         >         >
> >         +git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#1b5d618e301ad94aa20e692ffba16469dece8d10
> >         >         > Compile-Date: Aug 11 2009 10:44:17
> >         >         > Enable-Threads: on
> >         >         > Enable-Debug: off
> >         >         > Enable-GProf: off
> >         >         > Enable-Memtrace: off
> >         >         > Enable-Sun-STREAMS: on
> >         >         > Enable-Sun-Door: on
> >         >         > Enable-IPv6: off
> >         >         > Enable-Spoof-Source: on
> >         >         > Enable-TCP-Wrapper: off
> >         >         > Enable-SSL: on
> >         >         > Enable-SQL: on
> >         >         > Enable-Linux-Caps: off
> >         >         > Enable-Pcre: on
> >         >         > 
> >         >         > bash-3.00# uname -a
> >         >         > SunOS prelude 5.10 Generic_137137-09 sun4v sparc
> >         SUNW,T5240
> >         >         > Thanks!
> >         >         > 
> >         >         > -igor
> >         >         > 
> >         >         > Igor Manassypov., M.Eng, P.Eng, CCIE 23032, CCVP
> >         Network
> >         >         > Architect
> >         >         > 
> >         >         >
> >         ____________________________________________________________
> >         >         > 
> >         >         >
> >         ______________________________________________________________________________
> >         >         > Member info:
> >         https://lists.balabit.hu/mailman/listinfo/syslog-ng
> >         >         > Documentation:
> >         http://www.balabit.com/support/documentation/?product=syslog-ng
> >         >         > FAQ: http://www.campin.net/syslog-ng/faq.html
> >         >         > 
> >         >         >   
> >         >         
> >         >         
> >         >
> >         ______________________________________________________________________________
> >         > Member info:
> >         https://lists.balabit.hu/mailman/listinfo/syslog-ng
> >         > Documentation:
> >         http://www.balabit.com/support/documentation/?product=syslog-ng
> >         > FAQ: http://www.campin.net/syslog-ng/faq.html
> >         > 
> >         -- 
> >         Bazsi
> >         
> >         
> >         
> --
> Bazsi
> 
> ###################################################################################
> This communication is confidential and may be 
> privileged.  If you received it in error, please destroy 
> without copying and advise the sender.
> 
> By submitting personal information to CI Investments, 
> you agree to the collection, use and disclosure of 
> such personal information for the purposes described 
> in our Privacy Policy available at www.ci.com.
> 
> 
> Cette communication est confidentielle et pourrait être 
> privilégiée. Si vous la recevez par erreur, veuillez 
> l'éliminer sans en faire une copie et aviser l'expéditeur.
> 
> Lorsque vous soumettez des renseignements 
> personnels à Placements CI, vous nous permettez de 
> conserver, utiliser et divulguer ces renseignements 
> personnels aux fins décrites dans nos Principes 
> directeurs en matière de protection des renseignements personnels 
> qui sont disponibles au www.ci.com.
> 
> ################################################################
> 
-- 
Bazsi




More information about the syslog-ng mailing list