[Bug 43] New: syslog-ng 2.1.4 consumes all CPU when processing "source file "
https://bugzilla.balabit.com/show_bug.cgi?id=43 Summary: syslog-ng 2.1.4 consumes all CPU when processing "source file" Product: syslog-ng Version: 2.1.x Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: major Priority: unspecified Component: syslog-ng AssignedTo: bazsi@balabit.hu ReportedBy: denis.laplante@ubc.ca Type of the Report: bug Estimated Hours: 0.0 Summary: syslog-ng version 2.1.4 consumes all CPU when processing "source file" This happens with MacOsX-10.5 (via MacPorts compilation environment), and SunOS-5.8 (manual compilation). Version 2.0.9 on another MacOsX-10.5 compiled earlier behaves normally: lines are copied to destination file, and CPU usage remains low. Test Sequence: rm testlog.txt syslog-ng -f ./syslog-ng.conf -d -F [start without daemonizing] [ 2 lines appended to syslogng.log : ]
Apr 15 09:31:16 s_syslogng@greygum syslog-ng[28695]: Follow-mode file source not found, deferring open; filename='.../testlog.txt' Apr 15 09:31:16 s_syslogng@greygum syslog-ng[28695]: syslog-ng starting up; version='2.1.4'
touch testlog.txt [file gets opened, CPU is quiet] echo "hello" > testlog.txt [CPU immediately jumps to max, but no lines copied to destination] [ While CPU at max, process unkillable except "kill -9" ] [ While CPU at max, use GUI "Activity Monitor" option "Sample", results below: ]
Sampling process 28695 for 3 seconds with 1 millisecond of run time between samples Sampling completed, processing symbols... Analysis of sampling syslog-ng (pid 28695) every 1 millisecond Call graph: 1373 Thread_2503 1373 start 1373 main 1373 main_loop_run 1373 g_main_context_iteration 1373 g_main_context_iterate 1373 g_main_context_dispatch 1373 log_reader_fd_dispatch 1301 fd_do_read 1287 read$UNIX2003 1287 read$UNIX2003 13 fd_do_read 1 dyld_stub_read$UNIX2003 1 dyld_stub_read$UNIX2003 58 log_reader_iterate_buf 48 log_reader_iterate_buf 8 __memcpy 8 __memcpy 1 log_reader_find_eom 1 log_reader_find_eom 1 memmove 1 memmove 13 log_reader_fd_dispatch 1 g_sockaddr_unref 1 g_sockaddr_unref [ number probably counts when program counter was found in routine ] On SunOs, truss showed very fast cycling between alarm() and read().
[ syslog-ng.conf ] source s_testfile{ file(".../testlog.txt" follow_freq(10) flags(no-parse)); }; destination d_test_local{ file(".../testlog-copy.txt" perm(0644) ); }; log{ source(s_testfile); destination(d_test_local); }; source s_syslogng { internal(); }; # Messages internal to syslog-ng destination d_syslogng { file(".../syslogng.log" perm(0644) ); }; log{ source(s_syslogng); destination(d_syslogng); }; -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=43 --- Comment #1 from Denis Laplante <denis.laplante@ubc.ca> 2009-04-17 02:02:39 --- Compiled and successfully ran 2.0.10 on SunOs-5.8 system, so I have a workaround to this problem. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=43 Balazs Scheidler <bazsi@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO --- Comment #2 from Balazs Scheidler <bazsi@balabit.hu> 2009-04-29 17:40:44 --- can you try 3.0 if it fixes your problem? I don't run syslog-ng on MAC myself, but the 3.0 code is much better tested in the area. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=43 --- Comment #3 from Denis Laplante <denis.laplante@ubc.ca> 2009-04-30 20:47:10 --- (In reply to comment #2)
can you try 3.0 if it fixes your problem? Yes, 3.0.1 on Solaris-8 fixes this problem.
Looking at "truss" output, version 2.0.10 used "poll" to sleep, while malfunctioning version 2.1.4 used "alarm".
5.2506 read(3, 0x00087898, 8192) = 0 5.2508 alarm(0) = 10 5.2509 alarm(10) = 0 5.2511 read(3, 0x00087898, 8192) = 0 5.2513 alarm(0) = 10 5.2514 alarm(10) = 0 5.2516 read(3, 0x00087898, 8192) = 0
I will read MacPorts.org docs for creating and testing a new "portfile" for version 3.0.1 on MacOsX. Because of difficulties in killing syslog-ng-2.1.4, here is my procedure to stop runaway processes after 2 seconds: $ bash $ ulimit -t 2 $ syslog-ng-3.0.1/src/syslog-ng -f ./syslog-ng.local.3.conf --pidfile ${PWD}/syslog-ng-test.pid -d -F How do I suppress pidfile and/or control socket?
Error opening control socket, bind() failed; error='Permission denied (13)', socket='/local/var/run/syslog-ng.ctl'
-- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=43 Balazs Scheidler <bazsi@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |REOPENED --- Comment #4 from Balazs Scheidler <bazsi@balabit.hu> 2009-05-04 10:51:21 --- (In reply to comment #3)
(In reply to comment #2)
can you try 3.0 if it fixes your problem? Yes, 3.0.1 on Solaris-8 fixes this problem.
Looking at "truss" output, version 2.0.10 used "poll" to sleep, while malfunctioning version 2.1.4 used "alarm".
5.2506 read(3, 0x00087898, 8192) = 0 5.2508 alarm(0) = 10 5.2509 alarm(10) = 0 5.2511 read(3, 0x00087898, 8192) = 0 5.2513 alarm(0) = 10 5.2514 alarm(10) = 0 5.2516 read(3, 0x00087898, 8192) = 0
Hmm.. both syslog-ng versions are poll based, alarm() is only used to guard blocking reads on /proc/kmsg in Linux. (this is a workaround) But as it seems syslog-ng is at the end-of-file, but still tries to read again for some reason. Can you please post a larger strace with more context? The more I think of it, this should fix this: diff --git a/src/logreader.c b/src/logreader.c index fcf5ce3..a5177e5 100644 --- a/src/logreader.c +++ b/src/logreader.c @@ -439,7 +439,7 @@ log_reader_fetch_log(LogReader *self, FDRead *fd) g_sockaddr_unref(sa); return FALSE; } - else + else if (rc > 0) { if (self->options->padding && rc != self->options->padding) { @@ -457,7 +457,7 @@ log_reader_fetch_log(LogReader *self, FDRead *fd) } g_sockaddr_unref(sa); - if (self->flags & LR_NOMREAD) + if (rc == 0 || (self->flags & LR_NOMREAD)) break; } return TRUE;
I will read MacPorts.org docs for creating and testing a new "portfile" for version 3.0.1 on MacOsX.
Please wait for 3.0.2, 3.0.1 has some childish bugs.
Because of difficulties in killing syslog-ng-2.1.4, here is my procedure to stop runaway processes after 2 seconds: $ bash $ ulimit -t 2 $ syslog-ng-3.0.1/src/syslog-ng -f ./syslog-ng.local.3.conf --pidfile ${PWD}/syslog-ng-test.pid -d -F
How do I suppress pidfile and/or control socket?
Error opening control socket, bind() failed; error='Permission denied (13)', socket='/local/var/run/syslog-ng.ctl'
This is a different issue, you should probably create /local/var/run/ directory with appropriate permissions. I'm reopening this ticket, can you please verify that the above patch fixes your issue with 100 CPU usage? Thanks -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=43 --- Comment #5 from Denis Laplante <denis.laplante@ubc.ca> 2009-05-04 20:02:14 --- Thanks, your May 4 patch to logreader.c appears to fix the problem on SunOS-5.8 and MacOsX-10.5.6, though I don't trust it enough yet to put into production. The diff format was not recognized by my version of patch, not a big problem. I can email you full strace privately. It's too big (26 thousand lines) for me to sanitize before posting in public. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=43 Balazs Scheidler <bazsi@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED --- Comment #6 from Balazs Scheidler <bazsi@balabit.hu> 2009-05-05 14:45:28 --- (In reply to comment #5)
Thanks, your May 4 patch to logreader.c appears to fix the problem on SunOS-5.8 and MacOsX-10.5.6, though I don't trust it enough yet to put into production.
The diff format was not recognized by my version of patch, not a big problem.
I can email you full strace privately. It's too big (26 thousand lines) for me to sanitize before posting in public.
If the patch above fixes your problem, then no need to post the strace. I'll just commit this patch. commit 2a2c8e8c1fabd0a7515fdd728c0bb921381c1b37 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Tue May 5 14:43:52 2009 +0200 [affile] fixed 100% CPU usage for file sources (fixes: pub#43) The patch 8f7098563a1cc53220cd35c9dc415a11be502feb enables multi-read for regular files but as it seems this broke the EOF handling logic in LogReader, which is fixed by this patch. Reported-By: Denis Laplante Tested-By: Denis Laplante -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=43 Balazs Scheidler <bazsi@balabit.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |2.1.5 -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
participants (1)
-
bugzilla@bugzilla.balabit.com