[Bug 258] New: syslog-ng 3.4.2 stalls on Ubuntu 12.04 Precise with /var/ log full
https://bugzilla.balabit.com/show_bug.cgi?id=258 Summary: syslog-ng 3.4.2 stalls on Ubuntu 12.04 Precise with /var/log full Product: syslog-ng Version: 3.4.x Platform: PC OS/Version: Linux Status: NEW Severity: major Priority: unspecified Component: syslog-ng AssignedTo: bazsi@balabit.hu ReportedBy: carl.chenet-ext@cloudwatt.com Type of the Report: bug Estimated Hours: 0.0 Hi, We had several cases of frozen KVM virtual machines running Ubuntu 12.04 Precise with syslog-ng 3.4.2 taken from madhouse Debian repository. Issue: the servers are not reachable any more. You can initiate a ssh/console connection but the connection freezes before displaying the banner. Each time the /var/log lvm file system is full. The following errors occur in /var/log/system.log : Oct 30 07:05:18 s-metcld-0002 syslog-ng[8467]: I/O error occurred while writing; fd='16', error='No space left on device (28)' Oct 30 07:06:18 s-metcld-0002 syslog-ng[8467]: I/O error occurred while writing; fd='16', error='No space left on device (28)' Oct 30 07:07:18 s-metcld-0002 syslog-ng[8467]: internal() messages are looping back, preventing loop by suppressing all internal messages until the current message is processed; trigger-msg='', first-suppressed-msg='I/O error occurred while writing; fd=\'16\', error=\'No space left on device (28)\'' Oct 30 07:08:18 s-metcld-0002 syslog-ng[8467]: I/O error occurred while writing; fd='16', error='No space left on device (28)' Oct 30 07:09:18 s-metcld-0002 syslog-ng[8467]: I/O error occurred while writing; fd='16', error='No space left on device (28)' Oct 30 07:10:18 s-metcld-0002 syslog-ng[8467]: internal() messages are looping back, preventing loop by suppressing all internal messages until the current message is processed; trigger-msg='', first-suppressed-msg='I/O error occurred while writing; fd=\'16\', error=\'No space left on device (28)\'' Oct 30 07:11:18 s-metcld-0002 syslog-ng[8467]: I/O error occurred while writing; fd='16', error='No space left on device (28)' Oct 30 07:12:18 s-metcld-0002 syslog-ng[8467]: I/O error occurred while writing; fd='16', error='No space left on device (28)' Oct 30 07:13:19 s-metcld-0002 syslog-ng[8467]: internal() messages are looping back, preventing loop by suppressing all internal messages until the current message is processed; trigger-msg='', first-suppressed-msg='I/O error occurred while writing; fd=\'16\', error=\'No space left on device (28)\'' Oct 30 07:14:19 s-metcld-0002 syslog-ng[8467]: I/O error occurred while writing; fd='16', error='No space left on device (28)' Oct 30 07:15:19 s-metcld-0002 syslog-ng[8467]: I/O error occurred while writing; fd='16', error='No space left on device (28)' Oct 30 07:16:19 s-metcld-0002 syslog-ng[8467]: internal() messages are looping back, preventing loop by suppressing all internal messages until the current message is processed; trigger-msg='', first-suppressed-msg='I/O error occurred while writing; fd=\'16\', error=\'No space left on device (28)\'' Oct 30 07:17:19 s-metcld-0002 syslog-ng[8467]: I/O error occurred while writing; fd='16', error='No space left on device (28)' Workaround: We have the mcollective client installed on each server so we were able to kill the syslog-ng process by mcollective. when syslog-ng is killed, the system stops freezing immedialty, the ssh connection is back. I'm not sure to understand how syslog-ng freezes the system but it is worth reporting I guess. Regards, Carl Chenet -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
https://bugzilla.balabit.com/show_bug.cgi?id=258 --- Comment #1 from Balazs Scheidler <bazsi@balabit.hu> 2013-10-31 15:52:19 --- Is flow control enabled? If syslog-ng throttles /dev/log because of flow control exactly this stuff happens. -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
With the threaded(yes) option I understand that there is a pseudo throttle cechanism in effect. Can that result in this behaviour as well? On 10/31/2013 07:52 AM, bugzilla@wwwold.balabit.com wrote:
https://bugzilla.balabit.com/show_bug.cgi?id=258
--- Comment #1 from Balazs Scheidler <bazsi@balabit.hu> 2013-10-31 15:52:19 --- Is flow control enabled?
If syslog-ng throttles /dev/log because of flow control exactly this stuff happens.
No, syslog-ng disables that pseudo flow control when the destination is suspended. On Oct 31, 2013 4:14 PM, "Evan Rempel" <erempel@uvic.ca> wrote:
With the threaded(yes) option I understand that there is a pseudo throttle cechanism in effect. Can that result in this behaviour as well?
On 10/31/2013 07:52 AM, bugzilla@wwwold.balabit.com wrote:
https://bugzilla.balabit.com/show_bug.cgi?id=258
--- Comment #1 from Balazs Scheidler <bazsi@balabit.hu> 2013-10-31 15:52:19 --- Is flow control enabled?
If syslog-ng throttles /dev/log because of flow control exactly this stuff happens.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
https://bugzilla.balabit.com/show_bug.cgi?id=258 Carl Chenet <carl.chenet-ext@cloudwatt.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |carl.chenet- | |ext@cloudwatt.com --- Comment #2 from Carl Chenet <carl.chenet-ext@cloudwatt.com> 2013-12-19 15:42:39 --- Hi, I tried to implement the flow control but I still some hosts which stall when the file system is full. I have the following setup on my syslog-ng clients (only relevant parts): log_msg_size(16384); log_fifo_size(60000); source s_local { system(); internal(); syslog(ip("127.0.0.1") transport(tcp) port(514) host_override("host1") log_iw_size(30000) max-connections(300)); syslog(ip("127.0.0.1") transport(udp) port(514) host_override("host1") log_iw_size(30000) max-connections(300)); }; log { source(s_local); destination(d_system); flags(flow-control); }; But I always have some host stalling when /var/log is full. It seems to happen at the exact time when the logrotate of the system reloads syslog-ng my logrotate conf is : /var/log/auth.log /var/log/cron.log /var/log/mail.log /var/log/kernel.log /var/log/system.log { rotate 7 daily missingok compress delaycompress sharedscripts postrotate service syslog-ng reload > /dev/null endscript } The log follows: Dec 14 06:15:56 p-wkrcbx-0004.adm.prd1.val.cloudwatt.net chef-client[4824]: INFO: ruby_block[generate-mcollective-classes-delayed] sending run action to ruby_block[generate-mcollective-classes] (delayed) Dec 14 06:15:56 p-wkrcbx-0004.adm.prd1.val.cloudwatt.net chef-client[4824]: INFO: ruby_block[generate-mcollective-classes] called Dec 14 06:15:56 p-wkrcbx-0004.adm.prd1.val.cloudwatt.net chef-client[4824]: INFO: Chef Run complete in 10.500146744 seconds Dec 14 06:15:56 p-wkrcbx-0004.adm.prd1.val.cloudwatt.net chef-client[4824]: INFO: Running report handlers Dec 14 06:15:56 p-wkrcbx-0004.adm.prd1.val.cloudwatt.net chef-client[4824]: INFO: Report handlers complete Dec 14 06:25:15 p-wkrcbx-0004.adm.prd1.val.cloudwatt.net syslog-ng[26524]: I/O error occurred while writing; fd='47', error='No space left on device (28)' Dec 14 06:25:15 p-wkrcbx-0004.adm.prd1.val.cloudwatt.net syslog-ng[26524]: Suspending write operation because of an I/O error; fd='47', time_reopen='60' Dec 14 06:26:15 p-wkrcbx-0004.adm.prd1.val.cloudwatt.net syslog-ng[26524]: Error suspend timeout has elapsed, attempting to write again; fd='47' Dec 14 06:26:15 p-wkrcbx-0004.adm.prd1.val.cloudwatt.net syslog-ng[26524]: I/O error occurred while writing; fd='47', error='No space left on device (28)' And 06:25:15 is the time of the daily logrotate: # m h dom mon dow user command 17 * * * * root cd / && run-parts --report /etc/cron.hourly 25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) Don't hesitate to contact me for more information about this issue. Regards, Carl Chenet -- Configure bugmail: https://bugzilla.balabit.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.
participants (3)
-
Balazs Scheidler
-
bugzilla@bugzilla.balabit.com
-
Evan Rempel