[Bug 190] New: syslog-ng with TCP source, fails to shutdown properly, and generates core dump
https://bugzilla.balabit.com/show_bug.cgi?id=190 Summary: syslog-ng with TCP source, fails to shutdown properly, and generates core dump Product: syslog-ng Version: 3.3.x Platform: PC OS/Version: Solaris Status: NEW Severity: major Priority: unspecified Component: syslog-ng AssignedTo: bazsi@balabit.hu ReportedBy: marvin.nipper@stream.com Type of the Report: bug Estimated Hours: 0.0 OK. I decided to jump to 3.3.6, so that I could get rid of the source patching that I was doing to fix the address spoofing bug in 3.3.5. It compiled cleanly, and I was hoping that I was finally going to be on an "unpatched" version of 3.3.x!! However, there is now a completely new bug that has apparently been introduced into 3.3.6, related to TCP sources. My environment is Solaris 10 U10 x86. What I am seeing: The daemon starts fine, and starts collecting log data just fine; When I execute the standard /etc/init.d script to stop the daemon (using the kill of the PID): I now instantly get a core dump; I find that both of the syslog-ng processes are still running; I find that syslog-ng no longer appears to be processing any of the UDP input (as the files that should grow, because of UDP input, are no longer growing); and I find that the one file that I generate from TCP input, is still continuing to grow. Put simply, it appears that the "kill" does not clearly shutdown the TCP source, and so syslog-ng just continues to run, and continues to "eat" TCP packets, and feed them to their designated target file. I end up having to do a kill -9 on both of the syslog-ng daemons, in order to get them to stop. The relevant statements are fairly trivial, and they work flawlessly in 3.3.5 (as well as all previous releases): source any_tcp { tcp(port(601) max-connections(40) flags("store-legacy-msghdr", "threaded") use_dns(no) log_fetch_limit(100) log_iw_size(250)); }; destination workstation_log { file("/var/adm/log/workstation.log" create_dirs(yes) flags("threaded")); }; log { source(any_tcp); destination(workstation_log); flags(final); }; Anyway, I'm hoping that you will possibly have some clue as to a particular bit of code that might have been changed between 3.3.5 and 3.3.6, that might be causing this behavior(??). Let me know what all you might want/need from me, to assist in sorting this out. SORRY for the bad news (really). I was really hoping that 3.3.6 would be "painless", and I'm sure that you were hoping for the same thing. As with the spoofing bug in 3.3.5, I am ALWAYS more than happy to try out a patch, to see if that fixes the problem. I know that it may not be easy to try out Solaris-related changes (and maybe this issue is just something in Solaris). As always, THANKS for your time and help. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #1 from Marvin Nipper <marvin.nipper@stream.com> 2012-08-28 17:28:17 --- Here are two globs of output from two of the Solaris dump analysis tools. Let me know if these help at all? # pstack core_secdevrsn02_syslog-ng_0_0_1346165515_604 core 'core_secdevrsn02_syslog-ng_0_0_1346165515_604' of 604: /usr/local/sbin/syslog-ng -f /etc/syslog-ng/syslog-ng.conf -p /var/run ----------------- lwp# 1 / thread# 1 -------------------- feb2c405 _lwp_kill (1, 6) + 15 fead366f raise (6) + 1f feab2971 abort (8046fec, 400, fef5c9a8, 8047404, feba1798, 8047008) + cd fef3a1ab iv_set_fatal_msg_handler (fef5c9a8, 2, feb2eda2, fef3b33b) fef3b0a6 ???????? (8066200, 8078338, 8047458, fef3bff4, fef392aa, fef6cd6c) fef3b2ec ???????? (8066200, 8078338, 162014e, 8066200, fef6cd6c, fef6cd6c) fef397e4 iv_fd_unregister (8078338, fe590000, 8047528, 8066280, feaa614e, fef6cd6c) + 92 fef0cf93 ???????? (80782a8, 61747300, 80474c8, fef1d74f, 8047490, 8047494) fef0d7b0 ???????? (80782a8, feda0ec4, 8047508, feca8b64, 80779d0, fe8f5c1c) fe8d47d0 ???????? (80782a8, 2, 37ebd9ca, 37ebd9ca, 0, 0) fe8d5155 ???????? (8078200, 8047568, 8047528, feb21a0c, feba068c, fef724e4) fe8d47d0 ???????? (8078200, feb9f000, 8047558, feb21a0c, fef724e4, 0) fe8d629b afsocket_sd_deinit (807da18, fef441e0, 80475c8, 80779d0, fef6cd6c, 806c348) + 72 fef1cf54 ???????? (807da18, 806cd88, fef441e0, 1, 806d914, 8067018) fef1d376 ???????? (806d8e0, 1, 80653f8, 3, 7, feda0ec4) feef4bdd ???????? (806d8e0, 4, 1000000, 8072bb0, feba3098, feda0ec4) feef5d38 log_center_deinit (8076e38, fef6cd6c, 8047678, 8078a28, 1) + 3b feef6889 cfg_deinit (806c348, 8047cc4, 8047bf8, 8072bb0, 0, fef6cd6c) + 23 fef15032 ???????? (0, fece329b, 0, fef23141, 0, 8062870) fef14f60 ???????? (fef15000, 8066398, 0, 0, 3, fef6cd6c) fef1532c ???????? (fef6cd6c) fef153ae ???????? (0, 804770c, 8047718, fef04e98, 809e5d0, 80477ac) fef3cc23 ???????? (fef71e80, 8047744, 400, feefb939, fef6cd6c, 8000000) fef3c4df ???????? (fef71ea4, 1, 806820c, 1, 0, 1) fef39f8b ???????? (8066200, 8047bb4, 8047bac, ffff, 0, fef6cd6c) fef3a0fb iv_main (806c348, fef42f2e, 80630b8, 0, 8047c1c, feffa910) + c6 fef157d6 main_loop_run (80522b3, 8047c24, 8047c28, 8047c04, 0, 0) + 192 0805190c main (1, 8047c3c, 8047c58) + 219 08051404 _start (6, 8047d2c, 0, 0, 0, 0) + 80 ----------------- lwp# 2 / thread# 2 -------------------- feb2b327 _portfs (1b, fe54af0c, 400, fe54ef0c, fe54ef4c, 0) + 7 fef3b143 ???????? (8076358, fe54ef54, fe54ef4c, fef3c3fe, 8076428, 8076440) fef3a0de iv_main (8098180, 0, feba4f80, feb9f000, fe54efa0, feb21a0c) + a9 fef3d2c2 ???????? (8098170, 80767b0, fe54efd8, fe54efbc, 0, feb9f000) fef3e1ca ???????? (80767b0) feb28aab _thr_setup (fea03780) + 4e feb28db0 _lwp_start (fea03780, 0, 0, fe54eff8, feb28db0, fea03780) # pmap core_secdevrsn02_syslog-ng_0_0_1346165515_604 core 'core_secdevrsn02_syslog-ng_0_0_1346165515_604' of 604: /usr/local/sbin/syslog-ng -f /etc/syslog-ng/syslog-ng.conf -p /var/run 08042000 24K rw--- [ stack ] 08050000 12K r-x-- /usr/local/sbin/syslog-ng 08062000 4K rwx-- /usr/local/sbin/syslog-ng 08063000 664K rwx-- [ heap ] FE44B000 4K rw--- FE44E000 8K rw--- FE54A000 4K rw--- FE54D000 8K rw--- [ stack tid=2 ] FE550000 64K rw--- FE570000 64K rw--- FE590000 64K rwx-- FE5B0000 32K r-x-- /usr/local/lib/syslog-ng/libafstreams.so FE5C7000 4K rwx-- /usr/local/lib/syslog-ng/libafstreams.so FE5D0000 16K r-x-- /usr/local/lib/syslog-ng/libsyslogformat.so FE5E3000 4K rwx-- /usr/local/lib/syslog-ng/libsyslogformat.so FE5F0000 84K r-x-- /usr/local/lib/syslog-ng/libdbparser.so FE614000 4K rwx-- /usr/local/lib/syslog-ng/libdbparser.so FE620000 32K r-x-- /usr/local/lib/syslog-ng/libcsvparser.so FE637000 8K rwx-- /usr/local/lib/syslog-ng/libcsvparser.so FE640000 8K r-x-- /usr/local/lib/syslog-ng/libbasicfuncs.so FE651000 4K rwx-- /usr/local/lib/syslog-ng/libbasicfuncs.so FE660000 28K r-x-- /usr/local/lib/syslog-ng/libafuser.so FE676000 8K rwx-- /usr/local/lib/syslog-ng/libafuser.so FE680000 88K r-x-- /usr/local/lib/libnet.so.1.7.0 FE6A5000 8K rwx-- /usr/local/lib/libnet.so.1.7.0 FE6A7000 40K rwx-- /usr/local/lib/libnet.so.1.7.0 FE6C0000 1368K r-x-- /usr/local/ssl/lib/libcrypto.so.1.0.0 FE825000 88K rwx-- /usr/local/ssl/lib/libcrypto.so.1.0.0 FE83B000 8K rwx-- /usr/local/ssl/lib/libcrypto.so.1.0.0 FE840000 320K r-x-- /usr/local/ssl/lib/libssl.so.1.0.0 FE89F000 20K rwx-- /usr/local/ssl/lib/libssl.so.1.0.0 FE8B0000 20K r-x-- /usr/local/lib/syslog-ng/libsyslog-ng-crypto.so FE8C4000 4K rwx-- /usr/local/lib/syslog-ng/libsyslog-ng-crypto.so FE8D0000 88K r-x-- /usr/local/lib/syslog-ng/libafsocket.so FE8F5000 8K rwx-- /usr/local/lib/syslog-ng/libafsocket.so FE900000 40K r-x-- /usr/local/lib/syslog-ng/libafprog.so FE910000 16K rw--- FE919000 8K rwx-- /usr/local/lib/syslog-ng/libafprog.so FE920000 60K r-x-- /usr/local/lib/syslog-ng/libaffile.so FE930000 4K rwx-- FE93E000 4K rwx-- /usr/local/lib/syslog-ng/libaffile.so FE93F000 4K rwx-- /usr/local/lib/syslog-ng/libaffile.so FE950000 24K r-x-- /lib/libgen.so.1 FE960000 4K rwx-- FE966000 4K rw--- /lib/libgen.so.1 FE970000 24K r-x-- /lib/libuutil.so.1 FE980000 4K rwx-- FE986000 4K rw--- /lib/libuutil.so.1 FE990000 92K r-x-- /lib/libscf.so.1 FE9B0000 4K rwx-- FE9B7000 4K rw--- /lib/libscf.so.1 FE9C0000 12K r-x-- /lib/libmp.so.2 FE9D0000 4K rwx-- FE9D3000 4K rw--- /lib/libmp.so.2 FE9E0000 4K r-x-- /usr/lib/iconv/646%UTF-8.so FE9F0000 8K rwx-- /usr/lib/iconv/646%UTF-8.so FEA00000 64K rwx-- FEA20000 56K r-x-- /lib/libmd.so.1 FEA30000 4K rwx-- FEA3E000 4K rw--- /lib/libmd.so.1 FEA40000 32K r-x-- /lib/libaio.so.1 FEA50000 4K rwx-- FEA58000 4K rw--- /lib/libaio.so.1 FEA59000 4K rw--- /lib/libaio.so.1 FEA60000 44K r-x-- /usr/local/lib/libgcc_s.so.1 FEA70000 4K rwx-- FEA7A000 8K rwx-- /usr/local/lib/libgcc_s.so.1 FEA80000 1084K r-x-- /lib/libc.so.1 FEB90000 24K rwx-- FEB9F000 32K rwx-- /lib/libc.so.1 FEBA7000 8K rwx-- /lib/libc.so.1 FEBB0000 4K rwx-- FEBC0000 4K r-x-- /lib/libdl.so.1 FEBD1000 4K rw--- /lib/libdl.so.1 FEBE0000 216K r-x-- /lib/libresolv.so.2 FEC20000 4K rwx-- FEC26000 8K rw--- /lib/libresolv.so.2 FEC30000 12K r-x-- /usr/local/lib/libevtlog.so.0.0.0 FEC40000 4K r---- FEC42000 4K rwx-- /usr/local/lib/libevtlog.so.0.0.0 FEC50000 24K r-x-- /lib/librt.so.1 FEC60000 4K rwx-- FEC66000 4K rw--- /lib/librt.so.1 FEC70000 1156K r-x-- /usr/local/lib/libglib-2.0.so.0.2992.0 FEDA0000 12K rwx-- /usr/local/lib/libglib-2.0.so.0.2992.0 FEDB0000 12K r-x-- /lib/libpthread.so.1 FEDC0000 16K r-x-- /usr/local/lib/libgthread-2.0.so.0.2992.0 FEDD0000 4K rwx-- FEDD3000 4K rwx-- /usr/local/lib/libgthread-2.0.so.0.2992.0 FEDE0000 12K r-x-- /usr/local/lib/libgmodule-2.0.so.0.2992.0 FEDF0000 4K rwx-- FEDF2000 4K rwx-- /usr/local/lib/libgmodule-2.0.so.0.2992.0 FEE00000 516K r-x-- /lib/libnsl.so.1 FEE91000 20K rw--- /lib/libnsl.so.1 FEE96000 32K rw--- /lib/libnsl.so.1 FEEA0000 44K r-x-- /lib/libsocket.so.1 FEEB0000 16K r-x-- /lib/libthread.so.1 FEEBB000 4K rw--- /lib/libsocket.so.1 FEEC0000 8K r-x-- /lib/libdoor.so.1 FEED0000 4K rwx-- FEED2000 4K rw--- /lib/libdoor.so.1 FEEE0000 500K r-x-- FEF60000 4K rwx-- FEF6C000 20K rwx-- FEF71000 8K rwx-- FEF80000 4K rwx-- FEF90000 4K r---- FEFA0000 4K rw--- FEFB0000 4K rw--- FEFBE000 176K r-x-- /lib/ld.so.1 FEFF0000 4K rwx-- FEFF7000 4K rwx-- FEFFA000 8K rwx-- /lib/ld.so.1 FEFFC000 8K rwx-- /lib/ld.so.1 total 7832K -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
On Tue, Aug 28, 2012 at 05:28:18PM +0200, bugzilla@bugzilla.balabit.com wrote:
feb2c405 _lwp_kill (1, 6) + 15 fead366f raise (6) + 1f feab2971 abort (8046fec, 400, fef5c9a8, 8047404, feba1798, 8047008) + cd fef3a1ab iv_set_fatal_msg_handler (fef5c9a8, 2, feb2eda2, fef3b33b) fef3b0a6 ???????? (8066200, 8078338, 8047458, fef3bff4, fef392aa, fef6cd6c) fef3b2ec ???????? (8066200, 8078338, 162014e, 8066200, fef6cd6c, fef6cd6c) fef397e4 iv_fd_unregister (8078338, fe590000, 8047528, 8066280, feaa614e, fef6cd6c) + 92
Are there any error messages printed to standard output when this happens?
https://bugzilla.balabit.com/show_bug.cgi?id=190 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |3.3.7 CC| |algernon@balabit.hu AssignedTo|bazsi@balabit.hu |algernon@balabit.hu -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #2 from Gergely Nagy <algernon@balabit.hu> 2012-08-28 17:34:42 --- The backtrace is useful, thanks. I have a suspicion what broke it, will investigate as soon as possible. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 Marvin Nipper <marvin.nipper@stream.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|3.3.7 |--- --- Comment #3 from Marvin Nipper <marvin.nipper@stream.com> 2012-08-28 17:38:52 --- Regarding Lennert's query about standard output when this happens. My reply: Duh!!! Thanks for the kick in the tail, I should have looked there, and recorded that. Brain now engaged.... This is what popped up in the /var/adm/messages file: Aug 28 09:51:57 secdevrsn02 genunix: [ID 603404 kern.notice] NOTICE: core_log: syslog-ng[604] core dumped: /var/core/core_secdevrsn02_syslog-ng_0_0_1346165515_604 Aug 28 09:51:57 secdevrsn02 supervise/syslog-ng[603]: [ID 702911 daemon.crit] Daemon exited due to a deadlock/signal/failure, restarting; exitcode='134' -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
On Tue, Aug 28, 2012 at 05:38:51PM +0200, bugzilla@bugzilla.balabit.com wrote:
This is what popped up in the /var/adm/messages file: Aug 28 09:51:57 secdevrsn02 genunix: [ID 603404 kern.notice] NOTICE: core_log: syslog-ng[604] core dumped: /var/core/core_secdevrsn02_syslog-ng_0_0_1346165515_604 Aug 28 09:51:57 secdevrsn02 supervise/syslog-ng[603]: [ID 702911 daemon.crit] Daemon exited due to a deadlock/signal/failure, restarting; exitcode='134'
Right, but does the syslog-ng daemon output anything to its controlling terminal? (Can you run it by hand from a terminal?)
https://bugzilla.balabit.com/show_bug.cgi?id=190 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |3.3.7 --- Comment #4 from Gergely Nagy <algernon@balabit.hu> 2012-08-28 17:53:43 --- Hrm, tried to reproduce it under Linux, no luck. I'll try on Solaris at home. My suspicion is that the problem is in ivykis, there's been a few patches on its stable branch that affect Solaris, and I forgot to pick them before releasing syslog-ng 3.3.6. If that's the case, then updating lib/ivykis with the contents of the stable-v0.30 branch of it (link below) should do the trick. ivykis: https://github.com/buytenh/ivykis/tree/stable-v0.30 I'm not entirely sure I understand what exactly happens, especially not after the log message about the deadlock & restart. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #5 from Marvin Nipper <marvin.nipper@stream.com> 2012-08-28 19:47:12 --- Regarding Lennert's last question, if I run syslog-ng interactively, I see this: iv_port_upload_one: got error 2[No such file or directory] Abort (core dumped) -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #6 from Marvin Nipper <marvin.nipper@stream.com> 2012-08-28 20:37:07 --- OK. I looked at those ivykis updates. If I understood it all correctly, I see updates to these four code components (that were missing): ivykis/lib/iv_method_dev_poll.c ivykis/lib/iv_method_port.c ivykis/lib/iv_tls.c ivykis/modules/iv_event.c I applied the updates to all four, recompiled, and THE GOOD NEWS IS that this appears to have eliminated the failures. I've stopped/started one of my systems multiple times, and it appears to consistently work properly. So..... not sure how you want to handled that? Re-issue 3.3.6? Or just declare this as stuff that will be correctly slurped into 3.3.7? Anyway... Just wanted to let you know that this does appear to be the missing changes that were needed. AS ALWAYS...... THANKS for your quick response!!! -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
I think you should release a 3.3.7 BETA for a 1 or 2 weeks of testing, and then release the actual 3.3.7. There should never be an actual release that turns up a bug in 24 hours :-( Just my $0.02 ________________________________________ From: syslog-ng-bounces@lists.balabit.hu [syslog-ng-bounces@lists.balabit.hu] On Behalf Of bugzilla@wwwold.balabit.com [bugzilla@wwwold.balabit.com] Sent: Tuesday, August 28, 2012 11:37 AM To: syslog-ng@lists.balabit.hu Subject: [syslog-ng] [Bug 190] syslog-ng with TCP source, fails to shutdown properly, and generates core dump https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #6 from Marvin Nipper <marvin.nipper@stream.com> 2012-08-28 20:37:07 --- OK. I looked at those ivykis updates. If I understood it all correctly, I see updates to these four code components (that were missing): ivykis/lib/iv_method_dev_poll.c ivykis/lib/iv_method_port.c ivykis/lib/iv_tls.c ivykis/modules/iv_event.c I applied the updates to all four, recompiled, and THE GOOD NEWS IS that this appears to have eliminated the failures. I've stopped/started one of my systems multiple times, and it appears to consistently work properly. So..... not sure how you want to handled that? Re-issue 3.3.6? Or just declare this as stuff that will be correctly slurped into 3.3.7? Anyway... Just wanted to let you know that this does appear to be the missing changes that were needed. AS ALWAYS...... THANKS for your quick response!!! -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Evan Rempel <erempel@uvic.ca> writes:
I think you should release a 3.3.7 BETA for a 1 or 2 weeks of testing, and then release the actual 3.3.7.
Something along those lines is the plan. There are two other issues I wish to fix (syslog-ng -V -d segfaulting, and afsocket-notls not being notls), after which I'd call it a 3.3.7 RC. I did have an RC for 3.3.6 (called 3.3.5.90), but I don't think I advertised it much - I'll make sure to do that next time.
There should never be an actual release that turns up a bug in 24 hours :-(
I mostly agree, which is why 3.3.7 will be coming *much* sooner than 3.3.6 did. I won't explicitly release a beta, since the changes are so small, but since I have daily snapshot tarballs, I plan to call for testing one of those, once the stuff scheduled for 3.3.7 are fixed. Most likely tomorrow night, and I might end up calling it 3.3.6.1 instead, we'll see. -- |8]
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #7 from Gergely Nagy <algernon@balabit.hu> 2012-08-28 21:06:57 --- (In reply to comment #6)
I applied the updates to all four, recompiled, and THE GOOD NEWS IS that this appears to have eliminated the failures. I've stopped/started one of my systems multiple times, and it appears to consistently work properly.
Awesome, thanks for the quick tests!
So..... not sure how you want to handled that? Re-issue 3.3.6? Or just declare this as stuff that will be correctly slurped into 3.3.7?
The problem was that I forgot to update the ivykis submodule, I just pushed out a commit that corrects this, so the fix will be in the next version of syslog-ng, be that 3.3.7 or 3.3.6.1. I don't want to re-release 3.3.6. An updated tarball will be available at http://packages.madhouse-project.org/syslog-ng/3.3/3.3.6/syslog-ng-3.3.6-HEA... in a couple of hours, when the nightly cron job runs. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 Lennert Buytenhek <buytenh@wantstofly.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |buytenh@wantstofly.org --- Comment #8 from Lennert Buytenhek <buytenh@wantstofly.org> 2012-08-29 05:35:22 --- Marvin, would it be possible for you to produce and attach a truss log of this happening? (Preferably from the moment syslog-ng starts up until the moment it dies.) Thanks! -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #9 from Marvin Nipper <marvin.nipper@stream.com> 2012-08-29 15:59:17 --- Lennert: Actually, Gergely's previous comments about the missing ivykis components was the root cause of this issue. I had already applied those missing patches yesterday (fixing the problem), and no longer have the "broken" version available to produce that truss data. Gergely: I did pull down that latest HEAD version this morning, used it to rebuild another version of the components, and just pushed those to a server and tested them. As you would have expected, they did work just fine. So, at least for this issue, that HEAD version is a good start for a 3.3.6.1 release. Thanks to BOTH OF YOU for jumping in quickly, and offering to help (and obviously, finding a resolution!). Sorry (again) to have brought bad news your way, in the first place! Y'all have a great week. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #10 from Lennert Buytenhek <buytenh@wantstofly.org> 2012-08-29 16:10:25 --- Marvin, I can't speak on behalf of balabit, but looking at the diffs between ivykis v0.30.1 and the current ivykis stable-v0.30 HEAD, I can't explain why these patches would fix the issue that was reported, and I think that the fact that applying those patches makes the issue go away for you just means that they are obscuring a deeper issue somewhere else. I also have trouble reproducing the problem here, so the truss output and/or a reproduction recipe would still be much appreciated from my side. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #11 from Gergely Nagy <algernon@balabit.hu> 2012-08-29 16:38:53 --- I can't reproduce it, either, and unless I understand the problem, I'm not comfortable marking it fixed, so if you could go back to the stock 3.3.6 for the duration of a quick test, and upload the truss output (or mail it privately to me or Lennert), that would be most appreciated. Thanks! -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #12 from Marvin Nipper <marvin.nipper@stream.com> 2012-08-29 17:13:21 --- Alright... sort of depressing... as I thought I had this one behind me. I've trashed the old stuff, so I'll have to rebuild it, and redeploy it, but that's life, I suppose. I only want to have to do this once, so I want to get this right (and certainly, get you what you need). Not actually being a Solaris developer, and also not having ever needed to use truss, I'll be somewhat in the dark here. This is what I'm _thinking_ that I need to do: truss -aef -o /var/opt/truss.out \ /usr/local/sbin/syslog-ng -F -f /etc/syslog-ng/syslog-ng.conf \ -p /var/run/syslog-ng.pid \ --persist-file=/usr/local/syslog-ng/syslog-ng.persist The intent being to exactly mirror my production options. I just tried that with my working version of the executable, and it seems to work (i.e. syslog-ng still works, and truss generates data in the output file). And I am _guessing_ that you only need this thru the point where the core file appears?? I just need to know if these options (for truss) suffice, or if you are looking for something else? Let me know. Thanks. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #13 from Marvin Nipper <marvin.nipper@stream.com> 2012-08-29 18:08:32 --- Created an attachment (id=64) --> (https://bugzilla.balabit.com/attachment.cgi?id=64) Output from truss-based execution OK. I didn't wait for a reply to the previous post, and just went ahead and tried to do this, so that I could get back to other things. At first, I would start it, and stop it, and I was not getting a core dump (whereas I was previously able to force one of those almost immediately). I'm guessing that the use of truss may have altered the timing, or something. Anyway, I then decided to just let it run for about 10 minutes, and then issued a stop command, and it generated the standard output error that I had seen the other day: iv_port_upload_one: got error 2[No such file or directory] .... and it also generated a dump, and then the truss execution terminated as well. At least under the context of truss, the syslog-ng process did not continue to run (which is what they do without truss involved). So... I'm attaching the file here. Let me know if this is what you need, or not. I've retained the core dump (if you need something from that), and also have the original 3.3.6 executables still in place, so that I can swap between those, and the "working" version, if you need me to do something different with truss. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #14 from Lennert Buytenhek <buytenh@wantstofly.org> 2012-08-29 20:29:01 --- Marvin, thanks a lot for the truss output, this is quite helpful. The ivykis stable-v0.30 branch has a patch "port: Properly handle ETIME returns from port_getn()." on it which makes ivykis deal with the fact that port_getn() on Solaris, contrary to the available documentation, can simultaneously return events and claim that a timeout occured. The symptom that this fixes is lost events, but since that wasn't mentioned in this bug report, I somehow implicitly assumed that the issue you were seeing couldn't be the same issue. However, looking at your truss output, this issue does actually trigger in your situation (even if you may not be aware of it) -- this is port_getn() both returning an event (1) and reporting a timeout ([62], 62 is ETIME): 8740/1: port_getn(3, 0x08043B3C, 1024, 1, 0x08047B7C) = 1 [62] When port_getn() returns an event for a file descriptor, that file descriptor is unregistered ('dissociated' in port parlance) from the port. So, missing the return event means not only losing notification that the file descriptor is active, but also, losing notification that the file descriptor is now no longer associated with the port. This bites us further down the line, when the file descriptor is unregistered in the end (by calling iv_fd_unregister()). ivykis has not received an event for the file descriptor, and so it thinks that the file descriptor is still associated with the port, and that it must call port_dissociate() on it to dissociate it from the port. However, the kernel _has_ delivered an event, and has already dissociated the file descriptor from the port, and thus will return -ENOENT when we ask it to dissociate the file descriptor again. So, even though it did not seem that way at first, the issue you were seeing is actually the exact same issue that the port_getn() ETIME patch solves. I'm sorry for wasting your time on this, I just got confused by the apparent absence of the primary symptom that the port_getn() ETIME patch was meant to solve. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #15 from Marvin Nipper <marvin.nipper@stream.com> 2012-08-29 20:36:35 --- You guys NEVER waste my time. (Sorry that I was "whining" earlier!) I'm obviously "tickled" that the truss data provided a meaningful explanation for why the ivykis changes seemed to resolve the problem, such that you don't have to track down some "new and different" root cause for this scenario. Anyway... THANKS again for the quick responses, and suggested fixes. In the end, I'm just glad that it appears to be running reliably. So, again, I hope y'all have a great week, and do not hesitate to ping me, if you need anything else. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 Gergely Nagy <algernon@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution| |FIXED Status|ASSIGNED |RESOLVED --- Comment #16 from Gergely Nagy <algernon@balabit.hu> 2012-09-01 12:15:50 --- This only affects Solaris, and is fixed on 3.3's master branch, will be part of the next release. As such, I'm marking it fixed. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 Jose Oliveira <jpo@di.uminho.pt> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jpo@di.uminho.pt --- Comment #17 from Jose Oliveira <jpo@di.uminho.pt> 2012-09-01 15:33:57 --- Gergely(In reply to comment #16)
This only affects Solaris, and is fixed on 3.3's master branch, will be part of the next release. As such, I'm marking it fixed.
Shouldn't the minimal ivykis version be bumped to 0.30.2? diff --git a/configure.in b/configure.in index a663386..5cc3b5d 100644 --- a/configure.in +++ b/configure.in @@ -24,7 +24,7 @@ GLIB_MIN_VERSION="2.10.1" EVTLOG_MIN_VERSION="0.2.12" OPENSSL_MIN_VERSION="0.9.8" LIBDBI_MIN_VERSION="0.8.0" -IVYKIS_MIN_VERSION="0.30.1" +IVYKIS_MIN_VERSION="0.30.2" JSON_C_MIN_VERSION="0.7" JSON_GLIB_MIN_VERSION="0.12" PCRE_MIN_VERSION="6.1" /jpo -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #18 from Lennert Buytenhek <buytenh@wantstofly.org> 2012-09-01 16:31:33 --- (In reply to comment #17)
Shouldn't the minimal ivykis version be bumped to 0.30.2?
I screwed up on the ivykis 0.30.2 release, and failed to bump the version number in configure.ac, and so it claims to be 0.30.1. Since syslog-ng 3.3.7 is scheduled for release in about two months from now, there's a good chance that there will be a 0.30.3 ivykis out by then, and a requirements bump should probably be done then. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=190 --- Comment #19 from Gergely Nagy <algernon@balabit.hu> 2012-09-01 20:15:07 --- (In reply to comment #17)
Gergely(In reply to comment #16)
This only affects Solaris, and is fixed on 3.3's master branch, will be part of the next release. As such, I'm marking it fixed.
Shouldn't the minimal ivykis version be bumped to 0.30.2?
What Lennert said, and strictly speaking, 0.30.1 is the minimum required still, it works for all platforms but Solaris, and all platforms where ivykis has a chance to be installed system-wide, 0.30.1 is perfectly fine. I believe that bumping the bundled version is enough, the version required does not need to be increased in this case. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
participants (4)
-
bugzilla@bugzilla.balabit.com
-
Evan Rempel
-
Gergely Nagy
-
Lennert Buytenhek