[syslog-ng] ERROR:lib/dnscache.c:112:dns_cache_key_hash: code should not be reached
Balazs Scheidler
bazsi77 at gmail.com
Sat Oct 1 17:10:01 CEST 2016
Hi,
I could reproduce the issue in a localized environment. Thanks for your
efforts, hopefully the fix is underway, once I can diagnose it properly.
On Fri, Sep 30, 2016 at 8:47 AM, Claus Albøge <ca at tdchosting.dk> wrote:
> Hi,
>
> Just a quick update, after changing "dns-cache-size” to 10000, syslog-ng
> has been running for 3 days, without any restarts due to the SIGABRT.
>
> I guess the issues can be easily reproduced by setting “dns-cache-size” to
> 1.
>
> This issues is by the way related to the “Listen backlog issue” reported
> in: https://lists.balabit.hu/pipermail/syslog-ng/2016-
> September/023228.html (and https://github.com/balabit/syslog-ng/pull/1211).
> The sudden surge in SYN packets was caused because syslog-ng restarted due
> to the SIGABRT, causing all connected clients to reconnect. With a couple
> of thousand clients, it was easy to trigger a situation, where more than
> 255 clients reconnected at once, causing the hardcoded listen backlog to
> run full.
>
>
>
> /Claus A
>
>
> > On 26. sep. 2016, at 13.53, Claus Albøge <ca at tdchosting.dk> wrote:
> >
> > Hi,
> >
> > I tried to minimize my config as much as possible, but was unable to
> reproduce - had it run for a couple of hours.
> >
> > Looking at the source code pointed my towards
> “effective_dns_cache_options" and dns-cache-size (https://www.balabit.com/
> documents/syslog-ng-ose-latest-guides/en/syslog-ng-
> ose-guide-admin/html/examples-dns.html)
> >
> > Setting dns-cache-size to 1 on my minimized config triggered the SIGABRT
> within a few seconds. (Default value seems to be 1007!, which is higher
> that the number of hosts sending logs to my syslog-ng instance with the
> minimized config)
> >
> > Setting dns-cache-size to 10000 on a syslog-instance with the “full”
> number of log sources, seems to work around the issue, as syslog-ng has
> been running for almost a hour now without any restarts - normally it
> restarts every few minutes.
> >
> > So it does indeed seems like an issue with the dnscache logic. You might
> be able to reproduce by setting dns-cache-size to a very low number.
> >
> > If you still need a core dump to debug this further, I might need a bit
> of help creating the coredump correctly. After the SIGABRT, running gcore
> in gdb creates a core, but if I try to read this core again using gdb, I’m
> unable to do a backtrace. Do I need to do anything special?
> >
> >
> > /Claus A
> >
> >
> >> On 23. sep. 2016, at 08.37, Claus Albøge <ca at tdchosting.dk> wrote:
> >>
> >> Hi,
> >>
> >> I’ll try to reproduce with a minimal configuration, without ssl-certs,
> etc.
> >>
> >> Syslog-ng binaries are from: https://copr-be.cloud.
> fedoraproject.org/results/czanik/syslog-ng38/epel-7-x86_
> 64/00449829-syslog-ng/
> >>
> >>
> >> /Claus A
> >>
> >>> On 23. sep. 2016, at 08.25, Balazs Scheidler <bazsi77 at gmail.com>
> wrote:
> >>>
> >>> Is there a possibility for me to receive the core file and the
> binaries? (Syslog-ng plus dependent libs)
> >>>
> >>> This seems to be some kind of dns cache corruption, which is difficult
> to diagnose without those.
> >>>
> >>> Bazsi
> >>>
> >>>
> >>> On Sep 22, 2016 3:29 PM, "Claus Albøge" <ca at tdchosting.dk> wrote:
> >>> Hi,
> >>>
> >>> # gdb --args /usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
> >>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
> >>> Copyright (C) 2013 Free Software Foundation, Inc.
> >>> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> >>> This is free software: you are free to change and redistribute it.
> >>> There is NO WARRANTY, to the extent permitted by law. Type "show
> copying"
> >>> and "show warranty" for details.
> >>> This GDB was configured as "x86_64-redhat-linux-gnu".
> >>> For bug reporting instructions, please see:
> >>> <http://www.gnu.org/software/gdb/bugs/>...
> >>> Reading symbols from /usr/sbin/syslog-ng...Reading symbols from
> /usr/lib/debug/usr/sbin/syslog-ng.debug...done.
> >>> done.
> >>> (gdb) set pagination off
> >>> (gdb) run
> >>> Starting program: /usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
> >>> [Thread debugging using libthread_db enabled]
> >>> Using host libthread_db library "/lib64/libthread_db.so.1".
> >>> [New Thread 0x7ffff1e8a700 (LWP 12861)]
> >>> [New Thread 0x7ffff1689700 (LWP 12862)]
> >>> [New Thread 0x7ffff0e88700 (LWP 12863)]
> >>> [New Thread 0x7fffe3fff700 (LWP 13103)]
> >>> [New Thread 0x7fffe37fe700 (LWP 13108)]
> >>> [New Thread 0x7fffe2ffd700 (LWP 13109)]
> >>> [New Thread 0x7fffe27fc700 (LWP 13110)]
> >>> [New Thread 0x7fffe1ffb700 (LWP 13111)]
> >>> [Thread 0x7ffff1689700 (LWP 12862) exited]
> >>> [Thread 0x7ffff1e8a700 (LWP 12861) exited]
> >>> [New Thread 0x7ffff1e8a700 (LWP 13395)]
> >>> [New Thread 0x7ffff1689700 (LWP 13396)]
> >>> [Thread 0x7fffe37fe700 (LWP 13108) exited]
> >>> [Thread 0x7fffe2ffd700 (LWP 13109) exited]
> >>> [New Thread 0x7fffe37fe700 (LWP 13646)]
> >>> [New Thread 0x7fffe2ffd700 (LWP 13647)]
> >>> [Thread 0x7ffff1689700 (LWP 13396) exited]
> >>> [Thread 0x7fffe3fff700 (LWP 13103) exited]
> >>> [New Thread 0x7fffe3fff700 (LWP 13959)]
> >>> [New Thread 0x7ffff1689700 (LWP 13960)]
> >>> [Thread 0x7ffff1e8a700 (LWP 13395) exited]
> >>> [Thread 0x7fffe1ffb700 (LWP 13111) exited]
> >>> [New Thread 0x7fffe1ffb700 (LWP 14040)]
> >>> [New Thread 0x7ffff1e8a700 (LWP 14041)]
> >>> [Thread 0x7fffe3fff700 (LWP 13959) exited]
> >>> [Thread 0x7ffff0e88700 (LWP 12863) exited]
> >>> [New Thread 0x7ffff0e88700 (LWP 14182)]
> >>> [New Thread 0x7fffe3fff700 (LWP 14183)]
> >>> **
> >>> ERROR:lib/dnscache.c:112:dns_cache_key_hash: code should not be
> reached
> >>>
> >>> Program received signal SIGABRT, Aborted.
> >>> [Switching to Thread 0x7ffff0e88700 (LWP 14182)]
> >>> 0x00007ffff5fb05f7 in raise () from /lib64/libc.so.6
> >>> Missing separate debuginfos, use: debuginfo-install
> bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64
> elfutils-libs-0.163-3.el7.x86_64 eventlog-0.2.13-4.el7.x86_64
> glib2-2.42.2-5.el7.x86_64 glibc-2.17-106.el7_2.8.x86_64
> ivykis-0.36.2-2.el7.x86_64 json-c-0.11-4.el7_0.x86_64
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-12.el7_2.x86_64
> libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64
> libcom_err-1.42.9-7.el7.x86_64 libgcc-4.8.5-4.el7.x86_64
> libgcrypt-1.5.3-12.el7_1.1.x86_64 libgpg-error-1.12-3.el7.x86_64
> libnet-1.1.6-7.el7.x86_64 libselinux-2.2.2-6.el7.x86_64
> libuuid-2.23.2-26.el7_2.3.x86_64 openssl-libs-1.0.1e-51.el7_2.5.x86_64
> pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-19.el7_2.13.x86_64
> tcp_wrappers-libs-7.6-77.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64
> zlib-1.2.7-15.el7.x86_64
> >>> (gdb) bt full
> >>> #0 0x00007ffff5fb05f7 in raise () from /lib64/libc.so.6
> >>> No symbol table info available.
> >>> #1 0x00007ffff5fb1ce8 in abort () from /lib64/libc.so.6
> >>> No symbol table info available.
> >>> #2 0x00007ffff70426d5 in g_assertion_message () from
> /lib64/libglib-2.0.so.0
> >>> No symbol table info available.
> >>> #3 0x00007ffff704276a in g_assertion_message_expr () from
> /lib64/libglib-2.0.so.0
> >>> No symbol table info available.
> >>> #4 0x00007ffff7b62cc4 in dns_cache_key_hash (e=<optimized out>) at
> lib/dnscache.c:112
> >>> e = <optimized out>
> >>> #5 0x00007ffff700c4fc in g_hash_table_remove_internal () from
> /lib64/libglib-2.0.so.0
> >>> No symbol table info available.
> >>> #6 0x00007ffff7b67584 in resolve_sockaddr_to_inet_or_inet6_hostname
> (host_resolve_options=0x651754, saddr=0x183d350,
> result_len=0x7ffff0e85370) at lib/host-resolve.c:332
> >>> hname = 0x7ffff0e864b8 "213.174.76.60"
> >>> hname_len = 5
> >>> positive = 0
> >>> dnscache_key = 0x183d368
> >>> #7 resolve_sockaddr_to_hostname (result_len=result_len at entry=0x7ffff0e85370,
> saddr=0x183d350, host_resolve_options=0x651754) at lib/host-resolve.c:343
> >>> No locals.
> >>> #8 0x00007ffff7b6c03a in log_source_mangle_hostname (self=self at entry=0x1868c10,
> msg=msg at entry=0x7fffe87cdd40) at lib/logsource.c:156
> >>> resolved_name = <optimized out>
> >>> resolved_name_len = 140737093885344
> >>> #9 0x00007ffff7b6c280 in log_source_queue (s=0x1868c10,
> msg=0x7fffe87cdd40, path_options=0x7ffff0e85550, user_data=<optimized out>)
> at lib/logsource.c:266
> >>> self = 0x1868c10
> >>> i = <optimized out>
> >>> __func__ = "log_source_queue"
> >>> #10 0x00007ffff7b6ca0a in log_pipe_queue (path_options=<optimized
> out>, msg=<optimized out>, s=<optimized out>) at lib/logpipe.h:339
> >>> No locals.
> >>> #11 log_pipe_forward_msg (self=<optimized out>,
> path_options=<optimized out>, msg=<optimized out>) at lib/logpipe.h:302
> >>> No locals.
> >>> #12 log_pipe_queue (path_options=<optimized out>, msg=<optimized out>,
> s=<optimized out>) at lib/logpipe.h:343
> >>> No locals.
> >>> #13 log_pipe_forward_msg (self=<optimized out>,
> path_options=<optimized out>, msg=<optimized out>) at lib/logpipe.h:302
> >>> No locals.
> >>> #14 log_pipe_queue (path_options=<optimized out>, msg=0x7fffe87cdd40,
> s=0x7fffe87cdd40) at lib/logpipe.h:343
> >>> No locals.
> >>> #15 log_source_post (self=self at entry=0x1868c10, msg=msg at entry=0x7fffe87cdd40)
> at lib/logsource.c:249
> >>> path_options = {ack_needed = 1, flow_control_requested = 0,
> matched = 0x0}
> >>> old_window_size = <optimized out>
> >>> __func__ = "log_source_post"
> >>> #16 0x00007ffff7b6b179 in log_reader_handle_line (aux=0x7ffff0e85600,
> length=<optimized out>, line=<optimized out>, self=0x1868c10) at
> lib/logreader.c:320
> >>> m = 0x7fffe87cdd40
> >>> #17 log_reader_fetch_log (self=self at entry=0x1868c10) at
> lib/logreader.c:381
> >>> bookmark = <optimized out>
> >>> msg = 0x7fffe445cef5 "<14>1 2016-09-22T10:51:07.178337+02:00
> pafodev02.int.patientforsikringen.dk Microsoft-Windows-Security-Auditing
> 608 - [NXLOG at 47121 AssetID=\"105939\" CustomerID=\"0002592\"
> Location=\"Hyper-V - Farm 01 (ps"...
> >>> msg_len = 4245
> >>> status = <optimized out>
> >>> msg_count = 1
> >>> may_read = 1
> >>> aux = {peer_addr = 0x0, data = "\000\207\350\360\377\177\000\
> 000)\000\000\000\000\000\000\000\304\335U\366\377\177\000\
> 000\000Y\350\360\377\177\000\000\230\235\332\001\000\000\000\000@
> \245\350\377\177\000\000h\313U\366\377\177\000\000 at V\
> 350\360\377\177\000\000 at V\350\360\377\177\000\000\000Y\350\
> 360\377\177\000\000\340\037\245\350\377\177\000\000\b\000\
> 000\000\000\000\000\000\250\037\245\350\377\177\000\000\
> 200V\350\360\377\177\000\000\223\341U\366\377\177\000\000\
> 001\000\000\000\000\000\000\000\006Y\350\360\377\177\000\
> 000\aY\350\360\377\177\000\000\000Y\350\360\377\177\000\
> 000\aY\350\360\377\177\000\000gk\266\367\377\177\000\000\220", '\000'
> <repeats 15 times>, "\240 E\001\000\000\000\000\301\326\003\367\377\177\000\000"...,
> end_ptr = 0}
> >>> __func__ = "log_reader_fetch_log"
> >>> #18 0x00007ffff7b6b299 in log_reader_work_perform (s=0x1868c10) at
> lib/logreader.c:88
> >>> self = 0x1868c10
> >>> #19 0x00007ffff7b7307d in _work (self=<optimized out>) at
> lib/mainloop-io-worker.c:52
> >>> No locals.
> >>> #20 0x00007ffff655dc67 in iv_work_thread_do_work () from
> /lib64/libivykis.so.0
> >>> No symbol table info available.
> >>> #21 0x00007ffff655cfc3 in iv_run_tasks () from /lib64/libivykis.so.0
> >>> No symbol table info available.
> >>> #22 0x00007ffff655f7dc in iv_main () from /lib64/libivykis.so.0
> >>> No symbol table info available.
> >>> #23 0x00007ffff655dab4 in iv_work_thread () from /lib64/libivykis.so.0
> >>> No symbol table info available.
> >>> #24 0x00007ffff6560270 in iv_thread_handler () from
> /lib64/libivykis.so.0
> >>> No symbol table info available.
> >>> #25 0x00007ffff6344dc5 in start_thread () from /lib64/libpthread.so.0
> >>> No symbol table info available.
> >>> #26 0x00007ffff6071ced in clone () from /lib64/libc.so.6
> >>> No symbol table info available.
> >>> (gdb) frame 6
> >>> #6 0x00007ffff7b67584 in resolve_sockaddr_to_inet_or_inet6_hostname
> (host_resolve_options=0x651754, saddr=0x183d350,
> result_len=0x7ffff0e85370) at lib/host-resolve.c:332
> >>> 332 dns_caching_store(saddr->sa.sa_family, dnscache_key,
> hname, positive);
> >>> (gdb) print *saddr
> >>> $1 = {refcnt = {counter = 3}, flags = 0, sa_funcs = 0x7ffff7dd7060
> <inet_sockaddr_funcs>, salen = 16, sa = {sa_family = 2, sa_data =
> "\225\327ծL<\000\000\000\000\000\000\000"}}
> >>> (gdb) print *saddr->sa
> >>> Structure has no component named operator*.
> >>> (gdb) print saddr
> >>> $2 = (GSockAddr *) 0x183d350
> >>>
> >>>
> >>> /Claus A
> >>>
> >>>
> >>>> On 21. sep. 2016, at 15.37, Szalai, Attila <
> Attila.Szalai at morganstanley.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> If it is convenient to you, just for the sake of my curiosity, can
> you run some more command in the gdb session, after you reproduced the
> issue?
> >>>>
> >>>> * bt full
> >>>> * frame 6 (you should search for the line with
> resolve_sockaddr_to_inet_or_inet6_hostname and write the number written
> in that line. In the case of your previous email, it was #6)
> >>>> * print *saddr
> >>>> * print *saddr->sa
> >>>>
> >>>>
> >>
> >>
> >>
> >> ____________________________________________________________
> __________________
> >> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> >> Documentation: http://www.balabit.com/support/documentation/?
> product=syslog-ng
> >> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
> >>
> >
> >
> > ____________________________________________________________
> __________________
> > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> > Documentation: http://www.balabit.com/support/documentation/?
> product=syslog-ng
> > FAQ: http://www.balabit.com/wiki/syslog-ng-faq
> >
>
>
> ____________________________________________________________
> __________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?
> product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
--
Bazsi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20161001/b42e3d5e/attachment-0001.htm
More information about the syslog-ng
mailing list