[Bug 253] New: Segfault when reloading configuration
https://bugzilla.balabit.com/show_bug.cgi?id=253 Summary: Segfault when reloading configuration Product: syslog-ng Version: 3.4.x Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: unspecified Component: syslog-ng AssignedTo: bazsi@balabit.hu ReportedBy: gonzalo.paniagua+slng1@acquia.com Type of the Report: bug Estimated Hours: 0.0 syslog-ng version: 3.4.3 built from sources syslog-ng -V: syslog-ng 3.4.3 Installer-Version: 3.4.3 Revision: ssh+git://algernon@git.balabit/var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.4#no_branch#64d670f3cbfb90769f3c7f0fdd9c70bb9136ec5b Compile-Date: Sep 11 2013 05:33:29 Available-Modules: confgen,afsocket-tls,csvparser,affile,system-source,syslogformat,afuser,dbparser,afsocket-notls,basicfuncs,cryptofuncs,afsocket,afprog Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-IPv6: on Enable-Spoof-Source: off Enable-TCP-Wrapper: on Enable-Linux-Caps: off Enable-Pcre: on On rare occasions (I can't reproduce it reliably) syslog-ng segfaults when reloading (syslog-ng-ctl reload). I have a core dump file that gave me this stack trace: #0 0x00007fbd30102131 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fbd30101d76 in strdup () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fbd2f868b19 in evt_tag_str (tag=0x7fbd309a42f0 "encoding", value=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>) at evttags.c:76 #3 0x00007fbd3097918c in log_proto_server_options_validate (options=<optimized out>) at logproto-server.c:138 #4 0x00007fbd3097d191 in log_proto_server_validate_options (self=<optimized out>) at logproto-server.h:82 #5 log_reader_init (s=0x14a6a30) at logreader.c:638 #6 0x00007fbd2e9d5b07 in log_pipe_init (cfg=0x0, s=0x14a6a30) at ../../lib/logpipe.h:253 #7 log_pipe_init (cfg=0x0, s=0x14a6a30) at afsocket-source.c:763 #8 afsocket_sc_init (s=0x14a68f0) at afsocket-source.c:149 #9 0x00007fbd2e9d67fd in log_pipe_init (cfg=0x0, s=0x14a68f0) at ../../lib/logpipe.h:253 #10 log_pipe_init (cfg=0x0, s=0x14a68f0) at afsocket-source.c:763 #11 afsocket_sd_init (s=0x1489630) at afsocket-source.c:548 #12 0x00007fbd2e9d9719 in afunix_sd_init (s=<optimized out>) at afunix-source.c:194 #13 0x00007fbd3096d27a in log_pipe_init (cfg=<optimized out>, s=0x1489630) at logpipe.h:253 #14 cfg_tree_start (self=0x1c15c48) at cfg-tree.c:1064 #15 0x00007fbd309685de in cfg_init (cfg=0x1c15ae0) at cfg.c:220 #16 0x00007fbd30984faf in main_loop_reload_config_apply () at mainloop.c:501 #17 0x00007fbd3096d699 in control_connection_reload (self=0x1d09e90, command=<optimized out>) at control.c:150 #18 0x00007fbd3096dae8 in control_connection_io_input (s=0x1d09e90) at control.c:242 #19 control_connection_io_input (s=0x1d09e90) at control.c:172 #20 0x00007fbd309a106c in iv_fd_poll_and_run (st=0x1482580, to=<optimized out>) at iv_fd.c:163 #21 0x00007fbd309a1aac in iv_main () at iv_main_posix.c:117 #22 0x00007fbd309859fb in main_loop_run () at mainloop.c:736 #23 0x000000000040142b in main (argc=1, argv=0x7fffd8212a58) at main.c:267 When at frame #5, I got this: (gdb) print *self->proto->options $19 = {destroy = 0, initialized = -1, encoding = 0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, convert = 0xffffffffffffffff, max_msg_size = -1, max_buffer_size = 0, init_buffer_size = 0} I could not find anywhere in the code where initialized is set to -1, so I assume this is some kind of memory corruption. Let me know if you need anything else from me. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 --- Comment #1 from Gonzalo Paniagua <gonzalo.paniagua+slng1@acquia.com> 2013-09-15 18:18:39 --- FYI, the same tests under which syslog-ng 3.4.3 segfaults occassionally work fine with 3.3.11. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |algernon@balabit.hu AssignedTo|bazsi@balabit.hu |algernon@balabit.hu -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #2 from Gergely Nagy <algernon@balabit.hu> 2013-09-16 10:39:39 --- This sounds like as if proto->options would get casted to a different type, and get the -1 that way by incident. I'll check the code to see if that's the case, thanks for the report! -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution| |FIXED Status|ASSIGNED |RESOLVED --- Comment #3 from Gergely Nagy <algernon@balabit.hu> 2013-10-09 16:47:16 --- Found the problem, and it is fixed on master in the following commit: https://github.com/balabit/syslog-ng-3.4/commit/1b99bf02f6656f13b6bf2905391a... -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|FIXED | Status|RESOLVED |REOPENED --- Comment #4 from Gergely Nagy <algernon@balabit.hu> 2013-10-09 16:48:01 --- Erm, sorry. Closed the wrong bug. Reopening for now. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |NEEDINFO --- Comment #5 from Gergely Nagy <algernon@balabit.hu> 2013-10-10 10:32:01 --- I'm having trouble reproducing this, and nothing seems to be particularly wrong with the source, either... Can you upload a config, by any chance? And how often does the crash occur? -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 --- Comment #6 from Gonzalo Paniagua <gonzalo.paniagua+slng1@acquia.com> 2013-10-13 03:36:48 --- Created an attachment (id=83) --> (https://bugzilla.balabit.com/attachment.cgi?id=83) syslog-ng.conf I've attached the main configuration file. Nothing special about it. In sites.d/ individual files with content like: ======BEGIN destination df_watchdog_simpletest { file("/var/log/sites/simpletest/logs/srv-17/drupal-watchdog.log" owner(10002) group(33)); }; filter f_watchdog_simpletest { facility(local0) and program(^simpletest$); }; log { source(s_all); filter(f_watchdog_simpletest); destination(df_watchdog_simpletest); }; ======END are added and removed frequently, and each time this happens we run "/etc/init.d/syslog-ng reload". During our tests, there could be, say, 30 reloads in a server over 3 hours, and at some point one of those reloads causes a segfault. It does not happen in all the servers where we have the same configuration. I would say that it's close to 1 in 10 servers. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 Gonzalo Paniagua <gonzalo.paniagua+slng1@acquia.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |REOPENED --- Comment #7 from Gonzalo Paniagua <gonzalo.paniagua+slng1@acquia.com> 2013-10-13 03:37:17 --- Added configuration and more info. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 Balazs Scheidler <bazsi@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bazsi@balabit.hu --- Comment #8 from Balazs Scheidler <bazsi@balabit.hu> 2013-10-14 14:04:31 --- can I perhaps have the core dump and the binaries involved? I would muck a little bit around in the backtrace to check how this could happen. you can send it in private to me. alternatively, I would login to a box running this code where I could inspect syslog-ng and the core file. though this ones requires more trust from your side :) Thanks in advance. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 --- Comment #9 from Balazs Scheidler <bazsi@balabit.hu> 2013-10-14 14:32:26 --- the funny thing about the backtrace, is that it is quite unlikely to happen. (althou' that's what memory corruptions usually are), but anyway, the LogProtoServerOptions struct is filled single-threadedly right before the crash happens The whole LogReaderOptions struct (where LogProtoServerOptions is embedded) is zero-initialized when the new configuration is loaded (when allocating AFUnixSoureDriver instance, in afunix-source.c:222). Once zero-initialized, log_reader_options_defaults() is called in afsocket-source.c:736, which in turn calls log_proto_server_options_defaults() for the embedded struct. It sets a couple of fields to -1, but also sets the encoding field to NULL by the use of a wholesale memset as the first routine in logproto-server.c:150 Once the config loaded encoding should be NULL, then syslog-ng commences initializing the new configuration, and calls log_reader_options_init() in afsocket-source.c:535, the role of this function is to inherit global options and to initialize derived fields in the options structure, this happens quite close before the offending log_pipe_init() call that is the root for the crash. The address of the just initialized struct is propagated to the kept-alive source connection using log_reader_set_options() in afsocket-source.c:146, it uses self->owner to get the address of the LogReaderOptions struct, which is properly set by an earlier call to afsocket_sc_set_owner(). I'd really need a peek at the core files, to get some further clues, certainly it is possible that something frees the initialized struct under us, but at this time syslog-ng is single-threaded (because of the config reload), it is doing its stuff during config initialization strictly sequentially and the struct is initialized right before passed on to the crash-site. The crash is btw related to /dev/log having outstanding connections when the reload happens, maybe you can trigger the crash a bit easier if you keep a netcat hanging there on /dev/log # keep this running during reload $ nc -U /dev/log -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 --- Comment #10 from Gonzalo Paniagua <gonzalo.paniagua+slng1@acquia.com> 2013-10-17 09:44:30 --- Sadly, I lost the core dump file. I've been trying to reproduce the issue with a simpler set up and test, but no luck. At this moment, going back to the old set up and running a full test like the ones that experienced the problem is not an option :-(. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 --- Comment #11 from Balazs Scheidler <bazsi@balabit.hu> 2013-11-02 20:23:46 --- Evan was fighting with an unrelated issue in bug #257 and in his valgrind output I did find hints to fix this bug as well. I could find a way to reproduce and the patch below fixes the issue for me, so it probably will fix it for you. Can you please retest and confirm? Thanks in advance. The fix is here on the f/fix-logproto-server-validate-crash branch, available here: https://github.com/balabit/syslog-ng-3.4/commit/61cade70128beef0cbc5dd66ccf9... The problem affects both 3.4 and 3.5. Algernon, can you merge that to both releases? Thanks. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |ASSIGNED --- Comment #12 from Gergely Nagy <algernon@balabit.hu> 2013-11-03 15:59:32 ---
The problem affects both 3.4 and 3.5.
Algernon, can you merge that to both releases? Thanks.
Merged to both. Will leave this open for a bit longer, though. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=253 Evan Rempel <erempel@uvic.ca> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |erempel@uvic.ca --- Comment #13 from Evan Rempel <erempel@uvic.ca> 2013-11-06 06:34:31 --- Not exactly a segmentation fault, but I can't explain it. I just did a reload on version 3.4.4 (with latest patches) and triggered 2013-11-05T21:24:33-08:00 host daemon.crit supervise/syslog-ng[25895]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='134' Is it possible that the return code of syslog-ng is somehow linked to the return code of a child program destination? -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
This seems like a failing assertion as 138 indicates a SIGABRT Do you happen to have a core file for that? If you do can you pls open a new ticket, seems unrelated to this. Thanks. On Nov 6, 2013 6:34 AM, <bugzilla@bugzilla.balabit.com> wrote:
https://bugzilla.balabit.com/show_bug.cgi?id=253
Evan Rempel <erempel@uvic.ca> changed:
What |Removed |Added
---------------------------------------------------------------------------- CC| |erempel@uvic.ca
--- Comment #13 from Evan Rempel <erempel@uvic.ca> 2013-11-06 06:34:31 --- Not exactly a segmentation fault, but I can't explain it.
I just did a reload on version 3.4.4 (with latest patches) and triggered
2013-11-05T21:24:33-08:00 host daemon.crit supervise/syslog-ng[25895]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='134'
Is it possible that the return code of syslog-ng is somehow linked to the return code of a child program destination?
-- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
https://bugzilla.balabit.com/show_bug.cgi?id=253 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution| |FIXED Status|ASSIGNED |RESOLVED --- Comment #14 from Gergely Nagy <algernon@balabit.hu> 2013-11-21 17:32:21 --- This has been fixed in 3.4.5, closing. (A separate issue is open for Evan's issue) -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
participants (2)
-
Balazs Scheidler
-
bugzilla@bugzilla.balabit.com