[syslog-ng] ERROR:lib/dnscache.c:112:dns_cache_key_hash: code should not be reached

Claus Albøge ca at tdchosting.dk
Fri Sep 30 08:47:00 CEST 2016


Hi,

Just a quick update, after changing "dns-cache-size” to 10000, syslog-ng has been running for 3 days, without any restarts due to the SIGABRT.

I guess the issues can be easily reproduced by setting “dns-cache-size” to 1.

This issues is by the way related to the “Listen backlog issue” reported in: https://lists.balabit.hu/pipermail/syslog-ng/2016-September/023228.html (and https://github.com/balabit/syslog-ng/pull/1211). The sudden surge in SYN packets was caused because syslog-ng restarted due to the SIGABRT, causing all connected clients to reconnect. With a couple of thousand clients, it was easy to trigger a situation, where more than 255 clients reconnected at once, causing the hardcoded listen backlog to run full.



/Claus A


> On 26. sep. 2016, at 13.53, Claus Albøge <ca at tdchosting.dk> wrote:
> 
> Hi,
> 
> I tried to minimize my config as much as possible, but was unable to reproduce - had it run for a couple of hours. 
> 
> Looking at the source code pointed my towards “effective_dns_cache_options" and dns-cache-size (https://www.balabit.com/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/examples-dns.html)
> 
> Setting dns-cache-size to 1 on my minimized config triggered the SIGABRT within a few seconds. (Default value seems to be 1007!, which is higher that the number of hosts sending logs to my syslog-ng instance with the minimized config)
> 
> Setting dns-cache-size to 10000 on a syslog-instance with the “full” number of log sources, seems to work around the issue, as syslog-ng has been running for almost a hour now without any restarts - normally it restarts every few minutes.
> 
> So it does indeed seems like an issue with the dnscache logic. You might be able to reproduce by setting dns-cache-size to a very low number.
> 
> If you still need a core dump to debug this further, I might need a bit of help creating the coredump correctly. After the SIGABRT, running gcore in gdb creates a core, but if I try to read this core again using gdb, I’m unable to do a backtrace. Do I need to do anything special?
> 
> 
> /Claus A
> 
> 
>> On 23. sep. 2016, at 08.37, Claus Albøge <ca at tdchosting.dk> wrote:
>> 
>> Hi,
>> 
>> I’ll try to reproduce with a minimal configuration, without ssl-certs, etc.
>> 
>> Syslog-ng binaries are from: https://copr-be.cloud.fedoraproject.org/results/czanik/syslog-ng38/epel-7-x86_64/00449829-syslog-ng/
>> 
>> 
>> /Claus A
>> 
>>> On 23. sep. 2016, at 08.25, Balazs Scheidler <bazsi77 at gmail.com> wrote:
>>> 
>>> Is there a possibility for me to receive the core file and the binaries? (Syslog-ng plus dependent libs)
>>> 
>>> This seems to be some kind of dns cache corruption, which is difficult to diagnose without those.
>>> 
>>> Bazsi
>>> 
>>> 
>>> On Sep 22, 2016 3:29 PM, "Claus Albøge" <ca at tdchosting.dk> wrote:
>>> Hi,
>>> 
>>> # gdb --args /usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
>>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
>>> Copyright (C) 2013 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>...
>>> Reading symbols from /usr/sbin/syslog-ng...Reading symbols from /usr/lib/debug/usr/sbin/syslog-ng.debug...done.
>>> done.
>>> (gdb) set pagination off
>>> (gdb) run
>>> Starting program: /usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> [New Thread 0x7ffff1e8a700 (LWP 12861)]
>>> [New Thread 0x7ffff1689700 (LWP 12862)]
>>> [New Thread 0x7ffff0e88700 (LWP 12863)]
>>> [New Thread 0x7fffe3fff700 (LWP 13103)]
>>> [New Thread 0x7fffe37fe700 (LWP 13108)]
>>> [New Thread 0x7fffe2ffd700 (LWP 13109)]
>>> [New Thread 0x7fffe27fc700 (LWP 13110)]
>>> [New Thread 0x7fffe1ffb700 (LWP 13111)]
>>> [Thread 0x7ffff1689700 (LWP 12862) exited]
>>> [Thread 0x7ffff1e8a700 (LWP 12861) exited]
>>> [New Thread 0x7ffff1e8a700 (LWP 13395)]
>>> [New Thread 0x7ffff1689700 (LWP 13396)]
>>> [Thread 0x7fffe37fe700 (LWP 13108) exited]
>>> [Thread 0x7fffe2ffd700 (LWP 13109) exited]
>>> [New Thread 0x7fffe37fe700 (LWP 13646)]
>>> [New Thread 0x7fffe2ffd700 (LWP 13647)]
>>> [Thread 0x7ffff1689700 (LWP 13396) exited]
>>> [Thread 0x7fffe3fff700 (LWP 13103) exited]
>>> [New Thread 0x7fffe3fff700 (LWP 13959)]
>>> [New Thread 0x7ffff1689700 (LWP 13960)]
>>> [Thread 0x7ffff1e8a700 (LWP 13395) exited]
>>> [Thread 0x7fffe1ffb700 (LWP 13111) exited]
>>> [New Thread 0x7fffe1ffb700 (LWP 14040)]
>>> [New Thread 0x7ffff1e8a700 (LWP 14041)]
>>> [Thread 0x7fffe3fff700 (LWP 13959) exited]
>>> [Thread 0x7ffff0e88700 (LWP 12863) exited]
>>> [New Thread 0x7ffff0e88700 (LWP 14182)]
>>> [New Thread 0x7fffe3fff700 (LWP 14183)]
>>> **
>>> ERROR:lib/dnscache.c:112:dns_cache_key_hash: code should not be reached
>>> 
>>> Program received signal SIGABRT, Aborted.
>>> [Switching to Thread 0x7ffff0e88700 (LWP 14182)]
>>> 0x00007ffff5fb05f7 in raise () from /lib64/libc.so.6
>>> Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 elfutils-libs-0.163-3.el7.x86_64 eventlog-0.2.13-4.el7.x86_64 glib2-2.42.2-5.el7.x86_64 glibc-2.17-106.el7_2.8.x86_64 ivykis-0.36.2-2.el7.x86_64 json-c-0.11-4.el7_0.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-12.el7_2.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libgcrypt-1.5.3-12.el7_1.1.x86_64 libgpg-error-1.12-3.el7.x86_64 libnet-1.1.6-7.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libuuid-2.23.2-26.el7_2.3.x86_64 openssl-libs-1.0.1e-51.el7_2.5.x86_64 pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-19.el7_2.13.x86_64 tcp_wrappers-libs-7.6-77.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
>>> (gdb) bt full
>>> #0  0x00007ffff5fb05f7 in raise () from /lib64/libc.so.6
>>> No symbol table info available.
>>> #1  0x00007ffff5fb1ce8 in abort () from /lib64/libc.so.6
>>> No symbol table info available.
>>> #2  0x00007ffff70426d5 in g_assertion_message () from /lib64/libglib-2.0.so.0
>>> No symbol table info available.
>>> #3  0x00007ffff704276a in g_assertion_message_expr () from /lib64/libglib-2.0.so.0
>>> No symbol table info available.
>>> #4  0x00007ffff7b62cc4 in dns_cache_key_hash (e=<optimized out>) at lib/dnscache.c:112
>>>       e = <optimized out>
>>> #5  0x00007ffff700c4fc in g_hash_table_remove_internal () from /lib64/libglib-2.0.so.0
>>> No symbol table info available.
>>> #6  0x00007ffff7b67584 in resolve_sockaddr_to_inet_or_inet6_hostname (host_resolve_options=0x651754, saddr=0x183d350, result_len=0x7ffff0e85370) at lib/host-resolve.c:332
>>>       hname = 0x7ffff0e864b8 "213.174.76.60"
>>>       hname_len = 5
>>>       positive = 0
>>>       dnscache_key = 0x183d368
>>> #7  resolve_sockaddr_to_hostname (result_len=result_len at entry=0x7ffff0e85370, saddr=0x183d350, host_resolve_options=0x651754) at lib/host-resolve.c:343
>>> No locals.
>>> #8  0x00007ffff7b6c03a in log_source_mangle_hostname (self=self at entry=0x1868c10, msg=msg at entry=0x7fffe87cdd40) at lib/logsource.c:156
>>>       resolved_name = <optimized out>
>>>       resolved_name_len = 140737093885344
>>> #9  0x00007ffff7b6c280 in log_source_queue (s=0x1868c10, msg=0x7fffe87cdd40, path_options=0x7ffff0e85550, user_data=<optimized out>) at lib/logsource.c:266
>>>       self = 0x1868c10
>>>       i = <optimized out>
>>>       __func__ = "log_source_queue"
>>> #10 0x00007ffff7b6ca0a in log_pipe_queue (path_options=<optimized out>, msg=<optimized out>, s=<optimized out>) at lib/logpipe.h:339
>>> No locals.
>>> #11 log_pipe_forward_msg (self=<optimized out>, path_options=<optimized out>, msg=<optimized out>) at lib/logpipe.h:302
>>> No locals.
>>> #12 log_pipe_queue (path_options=<optimized out>, msg=<optimized out>, s=<optimized out>) at lib/logpipe.h:343
>>> No locals.
>>> #13 log_pipe_forward_msg (self=<optimized out>, path_options=<optimized out>, msg=<optimized out>) at lib/logpipe.h:302
>>> No locals.
>>> #14 log_pipe_queue (path_options=<optimized out>, msg=0x7fffe87cdd40, s=0x7fffe87cdd40) at lib/logpipe.h:343
>>> No locals.
>>> #15 log_source_post (self=self at entry=0x1868c10, msg=msg at entry=0x7fffe87cdd40) at lib/logsource.c:249
>>>       path_options = {ack_needed = 1, flow_control_requested = 0, matched = 0x0}
>>>       old_window_size = <optimized out>
>>>       __func__ = "log_source_post"
>>> #16 0x00007ffff7b6b179 in log_reader_handle_line (aux=0x7ffff0e85600, length=<optimized out>, line=<optimized out>, self=0x1868c10) at lib/logreader.c:320
>>>       m = 0x7fffe87cdd40
>>> #17 log_reader_fetch_log (self=self at entry=0x1868c10) at lib/logreader.c:381
>>>       bookmark = <optimized out>
>>>       msg = 0x7fffe445cef5 "<14>1 2016-09-22T10:51:07.178337+02:00 pafodev02.int.patientforsikringen.dk Microsoft-Windows-Security-Auditing 608 - [NXLOG at 47121 AssetID=\"105939\" CustomerID=\"0002592\" Location=\"Hyper-V - Farm 01 (ps"...
>>>       msg_len = 4245
>>>       status = <optimized out>
>>>       msg_count = 1
>>>       may_read = 1
>>>       aux = {peer_addr = 0x0, data = "\000\207\350\360\377\177\000\000)\000\000\000\000\000\000\000\304\335U\366\377\177\000\000\000Y\350\360\377\177\000\000\230\235\332\001\000\000\000\000@ \245\350\377\177\000\000h\313U\366\377\177\000\000 at V\350\360\377\177\000\000 at V\350\360\377\177\000\000\000Y\350\360\377\177\000\000\340\037\245\350\377\177\000\000\b\000\000\000\000\000\000\000\250\037\245\350\377\177\000\000\200V\350\360\377\177\000\000\223\341U\366\377\177\000\000\001\000\000\000\000\000\000\000\006Y\350\360\377\177\000\000\aY\350\360\377\177\000\000\000Y\350\360\377\177\000\000\aY\350\360\377\177\000\000gk\266\367\377\177\000\000\220", '\000' <repeats 15 times>, "\240 E\001\000\000\000\000\301\326\003\367\377\177\000\000"..., end_ptr = 0}
>>>       __func__ = "log_reader_fetch_log"
>>> #18 0x00007ffff7b6b299 in log_reader_work_perform (s=0x1868c10) at lib/logreader.c:88
>>>       self = 0x1868c10
>>> #19 0x00007ffff7b7307d in _work (self=<optimized out>) at lib/mainloop-io-worker.c:52
>>> No locals.
>>> #20 0x00007ffff655dc67 in iv_work_thread_do_work () from /lib64/libivykis.so.0
>>> No symbol table info available.
>>> #21 0x00007ffff655cfc3 in iv_run_tasks () from /lib64/libivykis.so.0
>>> No symbol table info available.
>>> #22 0x00007ffff655f7dc in iv_main () from /lib64/libivykis.so.0
>>> No symbol table info available.
>>> #23 0x00007ffff655dab4 in iv_work_thread () from /lib64/libivykis.so.0
>>> No symbol table info available.
>>> #24 0x00007ffff6560270 in iv_thread_handler () from /lib64/libivykis.so.0
>>> No symbol table info available.
>>> #25 0x00007ffff6344dc5 in start_thread () from /lib64/libpthread.so.0
>>> No symbol table info available.
>>> #26 0x00007ffff6071ced in clone () from /lib64/libc.so.6
>>> No symbol table info available.
>>> (gdb) frame 6
>>> #6  0x00007ffff7b67584 in resolve_sockaddr_to_inet_or_inet6_hostname (host_resolve_options=0x651754, saddr=0x183d350, result_len=0x7ffff0e85370) at lib/host-resolve.c:332
>>> 332         dns_caching_store(saddr->sa.sa_family, dnscache_key, hname, positive);
>>> (gdb) print *saddr
>>> $1 = {refcnt = {counter = 3}, flags = 0, sa_funcs = 0x7ffff7dd7060 <inet_sockaddr_funcs>, salen = 16, sa = {sa_family = 2, sa_data = "\225\327ծL<\000\000\000\000\000\000\000"}}
>>> (gdb) print *saddr->sa
>>> Structure has no component named operator*.
>>> (gdb) print saddr
>>> $2 = (GSockAddr *) 0x183d350
>>> 
>>> 
>>> /Claus A
>>> 
>>> 
>>>> On 21. sep. 2016, at 15.37, Szalai, Attila <Attila.Szalai at morganstanley.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> If it is convenient to you, just for the sake of my curiosity, can you run some more command in the gdb session, after you reproduced the issue?
>>>> 
>>>> * bt full
>>>> * frame 6 (you should search for the line with resolve_sockaddr_to_inet_or_inet6_hostname and write the number written in that line. In the case of your previous email, it was #6)
>>>> * print *saddr
>>>> * print *saddr->sa
>>>> 
>>>> 
>> 
>> 
>> 
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>> 
> 
> 
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
> 




More information about the syslog-ng mailing list