[Bug 84] New: Program destination blocks if program stops reading input
https://bugzilla.balabit.com/show_bug.cgi?id=84 Summary: Program destination blocks if program stops reading input Product: syslog-ng Version: 3.0.x Platform: PC OS/Version: Linux Status: NEW Severity: major Priority: unspecified Component: syslog-ng AssignedTo: bazsi@balabit.hu ReportedBy: erempel@uvic.ca Type of the Report: --- Estimated Hours: 0.0 Syslog-ng OSE 3.0.5 and 3.0.6 - binary download from balabit and self compiled all behave the same. If the program in a destination stops reading its input, syslog-ng buffers a small number of messages and then stops reading the source (in my case a pipe). I would expect syslog-ng to read from its input until the log_fifo_size limit was reached and then drop messages. an strace is attached. The configuration file is -------------- options { log_fifo_size(1000000); use_fqdn(yes); keep_hostname(yes); chain_hostnames(yes); time_reap(60); time_reopen(5); flush_lines(1000); flush_timeout(1000); }; template t_application { template("$MESSAGE\n"); template_escape(no); }; template t_localsyslog { template("<$PRI>$S_ISODATE $HOST erempel: $MSGONLY\n"); template_escape(no); }; source local { pipe("/var/log/syslog.pipes/erempel" flags(no-parse) ); }; source int { internal(); }; destination localsyslog { tcp("localhost" port(1514) localip(localhost) template(t_localsyslog) log_fifo_size(5000) ); }; destination erempel { program("/home1l/erempel/read-noread.pl" template(t_application) log_fifo_size(8000000) ); }; log { source(local); destination(erempel); }; log { source(int); destination(localsyslog); }; ---------------------- -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=84 --- Comment #1 from Evan Rempel <erempel@uvic.ca> 2010-06-02 17:54:13 --- strace can be acquired at http://web.uvic.ca/~erempel/syslog.trc.gz -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=84 --- Comment #2 from Evan Rempel <erempel@uvic.ca> 2010-06-02 17:55:19 --- lsof during the strace is localhost% sudo /usr/sbin/lsof -p 18279 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME syslog-ng 18279 root cwd DIR 253,6 4096 32793 /usr/local/etc/syslog-ng syslog-ng 18279 root rtd DIR 253,0 1024 2 / syslog-ng 18279 root txt REG 253,6 321731 233449 /usr/local/sbin/syslog-ng syslog-ng 18279 root mem REG 253,0 95464 49455 /lib64/libselinux.so.1 syslog-ng 18279 root mem REG 253,0 247496 49454 /lib64/libsepol.so.1 syslog-ng 18279 root mem REG 253,5 85928 394112 /usr/lib64/libz.so.1.2.3 syslog-ng 18279 root mem REG 253,0 647608 49451 /lib64/libglib-2.0.so.0.1200.3 syslog-ng 18279 root mem REG 253,0 1366176 49458 /lib64/libcrypto.so.0.9.8e syslog-ng 18279 root mem REG 253,5 190976 397026 /usr/lib64/libgssapi_krb5.so.2.2 syslog-ng 18279 root mem REG 253,0 10000 49465 /lib64/libcom_err.so.2.1 syslog-ng 18279 root mem REG 253,5 153464 397024 /usr/lib64/libk5crypto.so.3.1 syslog-ng 18279 root mem REG 253,5 613896 397025 /usr/lib64/libkrb5.so.3.3 syslog-ng 18279 root mem REG 253,5 35728 397023 /usr/lib64/libkrb5support.so.0.1 syslog-ng 18279 root mem REG 253,0 9728 49474 /lib64/libkeyutils-1.2.so syslog-ng 18279 root mem REG 253,0 137256 53855 /lib64/ld-2.5.so syslog-ng 18279 root mem REG 253,5 25464 393467 /usr/lib64/gconv/gconv-modules.cache syslog-ng 18279 root mem REG 253,4 493726 327686 /var/db/nscd/hosts syslog-ng 18279 root mem REG 253,0 50288 49190 /lib64/librt-2.5.so syslog-ng 18279 root mem REG 253,0 111480 49172 /lib64/libnsl-2.5.so syslog-ng 18279 root mem REG 253,6 21478 136301 /usr/local/lib/libevtlog.so.0.0.0 syslog-ng 18279 root mem REG 253,0 306632 49331 /lib64/libssl.so.0.9.8e syslog-ng 18279 root mem REG 253,0 20424 49168 /lib64/libdl-2.5.so syslog-ng 18279 root mem REG 253,0 1712216 49160 /lib64/libc-2.5.so syslog-ng 18279 root mem REG 253,0 142696 49186 /lib64/libpthread-2.5.so syslog-ng 18279 root mem REG 253,0 89800 49188 /lib64/libresolv-2.5.so syslog-ng 18279 root 0u CHR 136,1 3 /dev/pts/1 syslog-ng 18279 root 1u CHR 136,1 3 /dev/pts/1 syslog-ng 18279 root 2u CHR 136,1 3 /dev/pts/1 syslog-ng 18279 root 3u FIFO 253,4 491523 /var/log/syslog.pipes/erempel syslog-ng 18279 root 4u IPv4 121995943 TCP localhost:34513->localhost.localdomain:1514 (ESTABLISHED) syslog-ng 18279 root 5w FIFO 0,6 121995942 pipe syslog-ng 18279 root 6u unix 0xffff81043fd3f100 121995946 /var/syslog-ng.ctl -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=84 --- Comment #3 from Balazs Scheidler <bazsi@balabit.hu> 2010-06-03 20:15:28 --- Thanks for the information. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=84 --- Comment #4 from Balazs Scheidler <bazsi@balabit.hu> 2010-06-04 20:46:12 --- just wanted to report back that the strace/lsof/config is very useful. I've started to analyze the problem, and it is not related to flow-control, but rather it seems to be related to the facts that: 1) writer side of syslog-ng has a higher priority 2) for some reason syslog-ng assumes that the writer fd is writable, even though it doesn't validate this claim using poll. In the program() destination I cannot see how that could happen. This is where I stand right now. I can hopefully close this issue on the coming weekend. Thanks for your patience. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=84 Balazs Scheidler <bazsi@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution| |FIXED Status|NEW |RESOLVED --- Comment #5 from Balazs Scheidler <bazsi@balabit.hu> 2010-06-06 13:47:49 --- I could successfully fix this issue. As I wrote in the previous comment, this has nothing to do with flow-control. Rather it was a flush-timeout() handling issue, it didn't poll the destination, but rather assumed that it became writable after flush_timeout elapsed. Others probably didn't notice this problem, because the problem "fixes" itself after the destination became writable. However in this time periods (between flush_timeout and the time the destination became writable) syslog-ng was spinning the CPU. So this problem definitely should account for some CPU usage, even in environments where it didn't cause critical problems. Again, thanks for noticing and taking the time to track it down. This patch seems to fix the issue for me: Author: Balazs Scheidler <bazsi@balabit.hu> 2010-06-06 13:24:48 Committer: Balazs Scheidler <bazsi@balabit.hu> 2010-06-06 13:24:48 Parent: 5f82516db500209df040215c39da1b5adfc4d143 (main: stay in foreground if syntax-check or debug is requested) Branches: master, remotes/balabit/master, remotes/origin/master Follows: v3.0.6 Precedes: logwriter: fixed flush_timeout handling on non-file destinations Non-file destinations may not always be willing to accept messages, which means that even if the flush_timeout elapsed, the destination may not be ready to accept messages. However log_writer_fd_prepare() basically assumed that once the timeout elapses we can immediately start writing messages (which is true for file destinations but false for others). This patch fixes this case. Kudos to Evan who helped me to track down this issue. Reported-By: Evan Rempel -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=84 Balazs Scheidler <bazsi@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |3.0.7 -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=84 Oguz Yilmaz <oguzyilmazlist@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |oguzyilmazlist@gmail.com -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=84 --- Comment #6 from Balazs Scheidler <bazsi@balabit.hu> 2010-08-02 16:22:48 --- This change is also related to this, so this issue was finally fixed for 3.0.8: Author: Balazs Scheidler <bazsi@balabit.hu> 2010-06-21 20:04:59 Committer: Balazs Scheidler <bazsi@balabit.hu> 2010-06-21 20:04:59 Parent: d85c0159fc6cd1dcab2ba19439406e29042b6cf7 (preparations for 3.0.7) Child: b6ff55b0955cfe285ac7afe2063f832daed078a9 (csv-parser: fixed greedy + drop-invalid combination) Branches: master, remotes/balabit/master, remotes/origin/master Follows: v3.0.7 Precedes: v3.0.8 affile: fixed flush_timeout() processing for destination files The patch "logwriter: fixed flush_timeout handling on non-file destinations" broke flush_timeout() processing for file destination, this patch fixes that. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
participants (1)
-
bugzilla@bugzilla.balabit.com