[syslog-ng] Problems with recovering from 'disk full' I/O error

Balazs Scheidler bazsi77 at gmail.com
Mon Jun 17 13:46:01 CEST 2013


Hi,

I have pushed the fix on the hotfix/3.4/full-disk-crash-fix branch on the
github repository. Hopefully Algernon picks it up soon enough to master,
but I'm not sure when he plans a release in 3.4



On Sat, Jun 15, 2013 at 7:06 PM, Balazs Scheidler <bazsi77 at gmail.com> wrote:

> Thanks for the detailed report. Seems to be a use after free or double
> free in the error path. I'll have a closer look.
> On Jun 15, 2013 2:15 AM, "Johnson, Chris (HP TippingPoint Roseville)" <
> chris.johnson3 at hp.com> wrote:
>
>>  Hi all,****
>>
>> I've come across a situation where syslog-ng (3.3.3 and 3.3.9) aborts
>> after trying to write to a disk that has no space and then is cleaned up.
>> ****
>>
>> ** **
>>
>> I've been able to reproduce the 'error' condition with the following Set
>> up:****
>>
>> In the config file:****
>>
>> ###########################################################****
>>
>> # test log destination****
>>
>> #****
>>
>> filter f_test{program("LOGID_99-*" type("glob"));};****
>>
>> destination d_test { file("/var/testpartition/test.log" perm(0644)
>> flags(no-multi-line)); };****
>>
>> log {****
>>
>>         source(s_local);****
>>
>>         filter(f_test);****
>>
>>         destination(d_test);****
>>
>>         flags(final);****
>>
>> };****
>>
>> ** **
>>
>> **1)      **Fill up the '/var/testpartition' disk:****
>>
>> root at device:/var/testpartition# dd if=/dev/zero
>> of=/var/testpartition/foo bs=1M****
>>
>> dd: writing '/var/testpartition/foo': No space left on device****
>>
>> 3531+0 records in****
>>
>> 3529+1 records out****
>>
>> **2)      **Send messages (*'TEST: pre I/O error'*)to 'test.log' until
>> syslog-ng complains:****
>>
>> 2013-06-14 22:41:09.623 [device] [syslog-ng-ERROR:] "I/O error occurred
>> while writing; fd='22', error='No space left on device (28)'"****
>>
>> 2013-06-14 22:41:09.623 [device] [syslog-ng-NOTICE:] "Suspending write
>> operation because of an I/O error; fd='22', time_reopen='60'"****
>>
>> **3)      **Restart syslog-ng with a SIGHUP****
>>
>> root at device:/var/testpartition# kill -s HUP $(cat /var/run/syslog-ng.pid)
>> ****
>>
>> ** **
>>
>> Program received signal SIGHUP, Hangup.****
>>
>> 0x00007f38ee8dad63 in __epoll_wait_nocancel () from
>> /xxx/xxx/xxx/xxx/xxx/xxx/lib/libc.so.6****
>>
>> (gdb) continue****
>>
>> Continuing.****
>>
>> **4)      **Send another message (*'TEST: post I/O error; post SIGHUP'*)to
>> syslog-ng (it complains):****
>>
>> 2013-06-14 22:41:37.147 [device] [syslog-ng-NOTICE:] "Configuration
>> reload request received, reloading configuration;"****
>>
>> 2013-06-14 22:41:59.610 [device] [syslog-ng-ERROR:] "I/O error occurred
>> while writing; fd='23', error='No space left on device (28)'"****
>>
>> 2013-06-14 22:41:59.610 [device] [syslog-ng-NOTICE:] "Suspending write
>> operation because of an I/O error; fd='23', time_reopen='60'"****
>>
>> **5)      **Clean up '/var/testpartition' disk:****
>>
>> root at device:/var/testpartition# rm foo****
>>
>> **6)      **Send another message (*'TEST: post I/O error; post SIGHUP 2'*
>> ); syslog-ng aborts:****
>>
>> Program received signal SIGABRT, Aborted.****
>>
>> 0x00007f38ee83eda5 in raise (sig=<value optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:64****
>>
>> 64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or
>> directory.****
>>
>>         in ../nptl/sysdeps/unix/sysv/linux/raise.c****
>>
>> (gdb) bt****
>>
>> #0  0x00007f38ee83eda5 in raise (sig=<value optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:64****
>>
>> #1  0x00007f38ee8402c3 in abort () at abort.c:88****
>>
>> #2  0x00007f38ee87af18 in __libc_message (do_abort=2, fmt=0x7f38ee927cc8
>> "*** glibc detected *** %s: %s: 0x%s ***\n")****
>>
>>     at ../sysdeps/unix/sysv/linux/libc_fatal.c:170****
>>
>> #3  0x00007f38ee8804b8 in malloc_printerr (action=2, str=0x7f38ee927dd0
>> "double free or corruption (!prev)",****
>>
>>     ptr=<value optimized out>) at malloc.c:5891****
>>
>> #4  0x00007f38ee8825c6 in __libc_free (mem=0x7f38ee91e820) at
>> malloc.c:3626****
>>
>> #5  0x00007f38ef7d989a in ?? ()****
>>
>> #6  0x0000000000401800 in ?? () at elf-init.c:99****
>>
>> #7  0x00007f38efcf0257 in log_proto_file_writer_flush (s=0x674de0) at
>> logproto.c:306****
>>
>> #8  0x00007f38efcf04a8 in log_proto_file_writer_post (s=0x674de0,****
>>
>>     msg=0x6558a0 "2013-06-14T22:42:30.476+00:00 device LOGID_99-INFO:
>> TEST: post I/O error; post SIGHUP 2\n", msg_len=89,****
>>
>>     consumed=0x7fffa53e570c) at logproto.c:380****
>>
>> #9  0x00007f38efcfc274 in log_proto_post (s=0x674de0,****
>>
>>     msg=0x6558a0 "2013-06-14T22:42:30.476+00:00 device LOGID_99-INFO:
>> TEST: post I/O error; post SIGHUP 2\n", msg_len=89,****
>>
>>     consumed=0x7fffa53e570c) at logproto.h:95****
>>
>> #10 0x00007f38efcfc095 in log_writer_flush (self=0x655a30,
>> flush_mode=LW_FLUSH_NORMAL) at logwriter.c:983****
>>
>> #11 0x00007f38efcf9be3 in log_writer_work_perform (s=0x655a30) at
>> logwriter.c:129****
>>
>> #12 0x00007f38efcf9efe in log_writer_io_flush_output (s=0x655a30) at
>> logwriter.c:198****
>>
>> #13 0x00007f38efd238b5 in iv_run_tasks (st=0x6036a0) at iv_task.c:46****
>>
>> #14 0x00007f38efd22a86 in iv_main () at iv_main.c:266****
>>
>> #15 0x00007f38efcfedf9 in main_loop_run () at mainloop.c:736****
>>
>> #16 0x00000000004017c9 in main (argc=1, argv=0x7fffa53e5968) at main.c:263
>> ****
>>
>> (gdb)****
>>
>> ** **
>>
>> The last two test messages *are* being written (run together with the
>> last partial message that generated the 'disk full' I/O error):****
>>
>> root at device:/var/testpartition# tail test.log****
>>
>> 2013-06-14T22:40:58.140+00:00 device LOGID_99-INFO: TEST: pre I/O error**
>> **
>>
>> 2013-06-14T22:40:59.150+00:00 device LOGID_99-INFO: TEST: pre I/O error**
>> **
>>
>> 2013-06-14T22:41:00.180+00:00 device LOGID_99-INFO: TEST: pre I/O error**
>> **
>>
>> 2013-06-14T22:41:01.176+00:00 device LOGID_99-INFO: TEST: pre I/O error**
>> **
>>
>> 2013-06-14T22:41:02.170+00:00 device LOGID_99-INFO: TEST: pre I/O error**
>> **
>>
>> 2013-06-14T22:41:03.010+00:00 device LOGID_99-INFO: TEST: pre I/O error**
>> **
>>
>> 2013-06-14T22:41:03.930+00:00 device LOGID_99-INFO: TEST: pre I/O error**
>> **
>>
>> 2013-06-14T22:41:05.040+00:00 device LOGID_99-INFO: TEST: pre I/O error**
>> **
>>
>> 2013-06-14T22:41:06.180+00:00 device LOGID_99-INFO: TEST: pre I/O error**
>> **
>>
>> 2013-06-14T22:41:07.140+00:00 device LOGID_99-INFO:
>> T2013-06-14T22:42:30.476+00:00 device LOGID_99-INFO: TEST: post I/O error;
>> post SIGHUP 2013-06-14T22:42:30.476+00:00 device LOGID_99-INFO: TEST: post
>> I/O error; post SIGHUP 2****
>>
>> root at device:/var/testpartition#****
>>
>> ** **
>>
>> Thanks,****
>>
>> Chris****
>>
>> ----------------------------------------****
>>
>> Christopher Johnson****
>>
>> chris.johnson3 at hp.com****
>>
>> HP Software - Security Product Group****
>>
>> (916) 785-2817****
>>
>> ----------------------------------------****
>>
>> ** **
>>
>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation:
>> http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>
>>
>>


-- 
Bazsi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20130617/ba8c9c08/attachment-0001.htm 


More information about the syslog-ng mailing list