[syslog-ng] ERROR:lib/dnscache.c:112:dns_cache_key_hash: code should not be reached

Claus Albøge ca at tdchosting.dk
Mon Sep 26 13:53:13 CEST 2016


Hi,

I tried to minimize my config as much as possible, but was unable to reproduce - had it run for a couple of hours. 

Looking at the source code pointed my towards “effective_dns_cache_options" and dns-cache-size (https://www.balabit.com/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/examples-dns.html)

Setting dns-cache-size to 1 on my minimized config triggered the SIGABRT within a few seconds. (Default value seems to be 1007!, which is higher that the number of hosts sending logs to my syslog-ng instance with the minimized config)

Setting dns-cache-size to 10000 on a syslog-instance with the “full” number of log sources, seems to work around the issue, as syslog-ng has been running for almost a hour now without any restarts - normally it restarts every few minutes.

So it does indeed seems like an issue with the dnscache logic. You might be able to reproduce by setting dns-cache-size to a very low number.

If you still need a core dump to debug this further, I might need a bit of help creating the coredump correctly. After the SIGABRT, running gcore in gdb creates a core, but if I try to read this core again using gdb, I’m unable to do a backtrace. Do I need to do anything special?


/Claus A


> On 23. sep. 2016, at 08.37, Claus Albøge <ca at tdchosting.dk> wrote:
> 
> Hi,
> 
> I’ll try to reproduce with a minimal configuration, without ssl-certs, etc.
> 
> Syslog-ng binaries are from: https://copr-be.cloud.fedoraproject.org/results/czanik/syslog-ng38/epel-7-x86_64/00449829-syslog-ng/
> 
> 
> /Claus A
> 
>> On 23. sep. 2016, at 08.25, Balazs Scheidler <bazsi77 at gmail.com> wrote:
>> 
>> Is there a possibility for me to receive the core file and the binaries? (Syslog-ng plus dependent libs)
>> 
>> This seems to be some kind of dns cache corruption, which is difficult to diagnose without those.
>> 
>> Bazsi
>> 
>> 
>> On Sep 22, 2016 3:29 PM, "Claus Albøge" <ca at tdchosting.dk> wrote:
>> Hi,
>> 
>> # gdb --args /usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/sbin/syslog-ng...Reading symbols from /usr/lib/debug/usr/sbin/syslog-ng.debug...done.
>> done.
>> (gdb) set pagination off
>> (gdb) run
>> Starting program: /usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [New Thread 0x7ffff1e8a700 (LWP 12861)]
>> [New Thread 0x7ffff1689700 (LWP 12862)]
>> [New Thread 0x7ffff0e88700 (LWP 12863)]
>> [New Thread 0x7fffe3fff700 (LWP 13103)]
>> [New Thread 0x7fffe37fe700 (LWP 13108)]
>> [New Thread 0x7fffe2ffd700 (LWP 13109)]
>> [New Thread 0x7fffe27fc700 (LWP 13110)]
>> [New Thread 0x7fffe1ffb700 (LWP 13111)]
>> [Thread 0x7ffff1689700 (LWP 12862) exited]
>> [Thread 0x7ffff1e8a700 (LWP 12861) exited]
>> [New Thread 0x7ffff1e8a700 (LWP 13395)]
>> [New Thread 0x7ffff1689700 (LWP 13396)]
>> [Thread 0x7fffe37fe700 (LWP 13108) exited]
>> [Thread 0x7fffe2ffd700 (LWP 13109) exited]
>> [New Thread 0x7fffe37fe700 (LWP 13646)]
>> [New Thread 0x7fffe2ffd700 (LWP 13647)]
>> [Thread 0x7ffff1689700 (LWP 13396) exited]
>> [Thread 0x7fffe3fff700 (LWP 13103) exited]
>> [New Thread 0x7fffe3fff700 (LWP 13959)]
>> [New Thread 0x7ffff1689700 (LWP 13960)]
>> [Thread 0x7ffff1e8a700 (LWP 13395) exited]
>> [Thread 0x7fffe1ffb700 (LWP 13111) exited]
>> [New Thread 0x7fffe1ffb700 (LWP 14040)]
>> [New Thread 0x7ffff1e8a700 (LWP 14041)]
>> [Thread 0x7fffe3fff700 (LWP 13959) exited]
>> [Thread 0x7ffff0e88700 (LWP 12863) exited]
>> [New Thread 0x7ffff0e88700 (LWP 14182)]
>> [New Thread 0x7fffe3fff700 (LWP 14183)]
>> **
>> ERROR:lib/dnscache.c:112:dns_cache_key_hash: code should not be reached
>> 
>> Program received signal SIGABRT, Aborted.
>> [Switching to Thread 0x7ffff0e88700 (LWP 14182)]
>> 0x00007ffff5fb05f7 in raise () from /lib64/libc.so.6
>> Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 elfutils-libs-0.163-3.el7.x86_64 eventlog-0.2.13-4.el7.x86_64 glib2-2.42.2-5.el7.x86_64 glibc-2.17-106.el7_2.8.x86_64 ivykis-0.36.2-2.el7.x86_64 json-c-0.11-4.el7_0.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-12.el7_2.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libgcrypt-1.5.3-12.el7_1.1.x86_64 libgpg-error-1.12-3.el7.x86_64 libnet-1.1.6-7.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libuuid-2.23.2-26.el7_2.3.x86_64 openssl-libs-1.0.1e-51.el7_2.5.x86_64 pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-19.el7_2.13.x86_64 tcp_wrappers-libs-7.6-77.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
>> (gdb) bt full
>> #0  0x00007ffff5fb05f7 in raise () from /lib64/libc.so.6
>> No symbol table info available.
>> #1  0x00007ffff5fb1ce8 in abort () from /lib64/libc.so.6
>> No symbol table info available.
>> #2  0x00007ffff70426d5 in g_assertion_message () from /lib64/libglib-2.0.so.0
>> No symbol table info available.
>> #3  0x00007ffff704276a in g_assertion_message_expr () from /lib64/libglib-2.0.so.0
>> No symbol table info available.
>> #4  0x00007ffff7b62cc4 in dns_cache_key_hash (e=<optimized out>) at lib/dnscache.c:112
>>        e = <optimized out>
>> #5  0x00007ffff700c4fc in g_hash_table_remove_internal () from /lib64/libglib-2.0.so.0
>> No symbol table info available.
>> #6  0x00007ffff7b67584 in resolve_sockaddr_to_inet_or_inet6_hostname (host_resolve_options=0x651754, saddr=0x183d350, result_len=0x7ffff0e85370) at lib/host-resolve.c:332
>>        hname = 0x7ffff0e864b8 "213.174.76.60"
>>        hname_len = 5
>>        positive = 0
>>        dnscache_key = 0x183d368
>> #7  resolve_sockaddr_to_hostname (result_len=result_len at entry=0x7ffff0e85370, saddr=0x183d350, host_resolve_options=0x651754) at lib/host-resolve.c:343
>> No locals.
>> #8  0x00007ffff7b6c03a in log_source_mangle_hostname (self=self at entry=0x1868c10, msg=msg at entry=0x7fffe87cdd40) at lib/logsource.c:156
>>        resolved_name = <optimized out>
>>        resolved_name_len = 140737093885344
>> #9  0x00007ffff7b6c280 in log_source_queue (s=0x1868c10, msg=0x7fffe87cdd40, path_options=0x7ffff0e85550, user_data=<optimized out>) at lib/logsource.c:266
>>        self = 0x1868c10
>>        i = <optimized out>
>>        __func__ = "log_source_queue"
>> #10 0x00007ffff7b6ca0a in log_pipe_queue (path_options=<optimized out>, msg=<optimized out>, s=<optimized out>) at lib/logpipe.h:339
>> No locals.
>> #11 log_pipe_forward_msg (self=<optimized out>, path_options=<optimized out>, msg=<optimized out>) at lib/logpipe.h:302
>> No locals.
>> #12 log_pipe_queue (path_options=<optimized out>, msg=<optimized out>, s=<optimized out>) at lib/logpipe.h:343
>> No locals.
>> #13 log_pipe_forward_msg (self=<optimized out>, path_options=<optimized out>, msg=<optimized out>) at lib/logpipe.h:302
>> No locals.
>> #14 log_pipe_queue (path_options=<optimized out>, msg=0x7fffe87cdd40, s=0x7fffe87cdd40) at lib/logpipe.h:343
>> No locals.
>> #15 log_source_post (self=self at entry=0x1868c10, msg=msg at entry=0x7fffe87cdd40) at lib/logsource.c:249
>>        path_options = {ack_needed = 1, flow_control_requested = 0, matched = 0x0}
>>        old_window_size = <optimized out>
>>        __func__ = "log_source_post"
>> #16 0x00007ffff7b6b179 in log_reader_handle_line (aux=0x7ffff0e85600, length=<optimized out>, line=<optimized out>, self=0x1868c10) at lib/logreader.c:320
>>        m = 0x7fffe87cdd40
>> #17 log_reader_fetch_log (self=self at entry=0x1868c10) at lib/logreader.c:381
>>        bookmark = <optimized out>
>>        msg = 0x7fffe445cef5 "<14>1 2016-09-22T10:51:07.178337+02:00 pafodev02.int.patientforsikringen.dk Microsoft-Windows-Security-Auditing 608 - [NXLOG at 47121 AssetID=\"105939\" CustomerID=\"0002592\" Location=\"Hyper-V - Farm 01 (ps"...
>>        msg_len = 4245
>>        status = <optimized out>
>>        msg_count = 1
>>        may_read = 1
>>        aux = {peer_addr = 0x0, data = "\000\207\350\360\377\177\000\000)\000\000\000\000\000\000\000\304\335U\366\377\177\000\000\000Y\350\360\377\177\000\000\230\235\332\001\000\000\000\000@ \245\350\377\177\000\000h\313U\366\377\177\000\000 at V\350\360\377\177\000\000 at V\350\360\377\177\000\000\000Y\350\360\377\177\000\000\340\037\245\350\377\177\000\000\b\000\000\000\000\000\000\000\250\037\245\350\377\177\000\000\200V\350\360\377\177\000\000\223\341U\366\377\177\000\000\001\000\000\000\000\000\000\000\006Y\350\360\377\177\000\000\aY\350\360\377\177\000\000\000Y\350\360\377\177\000\000\aY\350\360\377\177\000\000gk\266\367\377\177\000\000\220", '\000' <repeats 15 times>, "\240 E\001\000\000\000\000\301\326\003\367\377\177\000\000"..., end_ptr = 0}
>>        __func__ = "log_reader_fetch_log"
>> #18 0x00007ffff7b6b299 in log_reader_work_perform (s=0x1868c10) at lib/logreader.c:88
>>        self = 0x1868c10
>> #19 0x00007ffff7b7307d in _work (self=<optimized out>) at lib/mainloop-io-worker.c:52
>> No locals.
>> #20 0x00007ffff655dc67 in iv_work_thread_do_work () from /lib64/libivykis.so.0
>> No symbol table info available.
>> #21 0x00007ffff655cfc3 in iv_run_tasks () from /lib64/libivykis.so.0
>> No symbol table info available.
>> #22 0x00007ffff655f7dc in iv_main () from /lib64/libivykis.so.0
>> No symbol table info available.
>> #23 0x00007ffff655dab4 in iv_work_thread () from /lib64/libivykis.so.0
>> No symbol table info available.
>> #24 0x00007ffff6560270 in iv_thread_handler () from /lib64/libivykis.so.0
>> No symbol table info available.
>> #25 0x00007ffff6344dc5 in start_thread () from /lib64/libpthread.so.0
>> No symbol table info available.
>> #26 0x00007ffff6071ced in clone () from /lib64/libc.so.6
>> No symbol table info available.
>> (gdb) frame 6
>> #6  0x00007ffff7b67584 in resolve_sockaddr_to_inet_or_inet6_hostname (host_resolve_options=0x651754, saddr=0x183d350, result_len=0x7ffff0e85370) at lib/host-resolve.c:332
>> 332         dns_caching_store(saddr->sa.sa_family, dnscache_key, hname, positive);
>> (gdb) print *saddr
>> $1 = {refcnt = {counter = 3}, flags = 0, sa_funcs = 0x7ffff7dd7060 <inet_sockaddr_funcs>, salen = 16, sa = {sa_family = 2, sa_data = "\225\327ծL<\000\000\000\000\000\000\000"}}
>> (gdb) print *saddr->sa
>> Structure has no component named operator*.
>> (gdb) print saddr
>> $2 = (GSockAddr *) 0x183d350
>> 
>> 
>> /Claus A
>> 
>> 
>>> On 21. sep. 2016, at 15.37, Szalai, Attila <Attila.Szalai at morganstanley.com> wrote:
>>> 
>>> Hi,
>>> 
>>> If it is convenient to you, just for the sake of my curiosity, can you run some more command in the gdb session, after you reproduced the issue?
>>> 
>>> * bt full
>>> * frame 6 (you should search for the line with resolve_sockaddr_to_inet_or_inet6_hostname and write the number written in that line. In the case of your previous email, it was #6)
>>> * print *saddr
>>> * print *saddr->sa
>>> 
>>> 
> 
> 
> 
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
> 




More information about the syslog-ng mailing list