race condition in destination driver deinit?
-----BEGIN PGP SIGNED MESSAGE----- Me again.... Hi guys, I've been running 3.3.3 on Solaris 10 x86 for quite a bit now. I've got two boxes, both running the same OS release and the same release of syslog-ng. One of them (box A) has a destination configured that doesn't really exist so I've been getting Mar 13 14:46:15 flowmon-sys syslog-ng[20118]: I/O error occurred while writing; fd='24', error='Connection refused (146)' which is perfectly fine and I just ignored it. The other Solaris box (box B) does not have this destination configured. It now happens that after a random number of HUP signals syslog-ng on box A crashes with a segmentation fault and the following backtrace: (gdb) bt #0 0xfef2a7a8 in log_dest_driver_release_queue_method (self=0x80d2958, q=0x83e58955, user_data=0x0) at driver.c:80 #1 0xfef2aa5c in log_dest_driver_deinit_method (s=0x80d2958) at driver.c:80 #2 0xfe914a14 in afsocket_dd_deinit (s=0x80d2958) at afsocket.c:109 #3 0xfef29cd6 in log_dest_group_deinit (s=0x8074198) at dgroup.c:59 #4 0xfef24a21 in log_center_deinit (self=0x8090cc0) at center.c:67 #5 0xfef25493 in cfg_deinit (cfg=0x80d2848) at cfg.c:90 #6 0xfef406b2 in main_loop_reload_config_apply () at mainloop.c:364 #7 0xfef40a42 in main_loop_io_worker_sync_call (func=<value optimized out>) at mainloop.c:364 #8 0x08047900 in ?? () I was able to reproduce this over and over again. Sometimes it happens after 11 HUPs, sometimes after 35, but it eventually does crash. Since the only difference between box A and box B is the one additional destination I suspect that this is the cause for this segfault. Maybe the HUP signal came at a time when syslog-ng was trying to send something to the non-existing destination? I'll try to collect some more data. If anybody could give me some direction on where exactly I should look into, I'll be happy to do that. - - Michael -----BEGIN PGP SIGNATURE----- Version: PGP Desktop 10.0.3 (Build 1) Charset: us-ascii wsBVAwUBT1+ZRZbfnpCg64TVAQEiVgf7BmePJa6Va396QRyBLNPpOHzAvi9p3n07 uVvvYOcMHUFwixZoC0BXs+21EfSkMOmROQWupXBTcEbyBMOf0+HXDsuO7mfZDznU AZzLBzhgaDPf0stx55PqOssKTf28QLtxs1gUmJSb2DgmK9WjtMsovAlS83pQOa6O oQIaK3SBIUtrbijdx7vLvS0VUcqhDzJI9TJ7R+A0x3i9dAWKIvDb7wLW4CvB9k19 7bmmXFwpAClGZ6DS1CdIt4eHuDiuIoes2uXNhh5xReZpCh89FHzv8LxWxVnph3ms 6FctjAt4JMQTGnZRHN7QidYAGwjjabqtGjKWvXtRfoBCdUfLfC+wAA== =JjOa -----END PGP SIGNATURE-----
Hi, I think I have fixed this in this patch a couple of days ago: commit 9064e909e8aef518ec3c073bccc1bf09da9a2c06 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Sun Apr 1 09:42:58 2012 +0200 driver: fixed possible leak and use-after-free log_dest_driver_release_queue() was possibly leaking the LogQueue instances for file destinations when they got reaped. This was caused by an earlier patch that fixed a crash in reloads, more specifically this one: c7070e2a6f1c3a312260bcecf49d62028fef27ce This patch should fix both cases properly, the leak in the file destination driver and the original crash in the afsocket destination. Also, this patch fixes a use-after-free condition, the next member of a GList structure was referenced after it was removed from the list. Kudos to Jakub for the detailed bug report and Algernon for the origianl fix. Reported-By: Jakub Jankowski <shasta@toxcorp.com> Signed-off-by: Gergely Nagy <algernon@balabit.hu> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> On Tue, 2012-03-13 at 15:00 -0400, Michael Hocke wrote:
Me again....
Hi guys,
I've been running 3.3.3 on Solaris 10 x86 for quite a bit now. I've got two boxes, both running the same OS release and the same release of syslog-ng. One of them (box A) has a destination configured that doesn't really exist so I've been getting
Mar 13 14:46:15 flowmon-sys syslog-ng[20118]: I/O error occurred while writing; fd='24', error='Connection refused (146)'
which is perfectly fine and I just ignored it. The other Solaris box (box B) does not have this destination configured. It now happens that after a random number of HUP signals syslog-ng on box A crashes with a segmentation fault and the following backtrace:
(gdb) bt #0 0xfef2a7a8 in log_dest_driver_release_queue_method (self=0x80d2958, q=0x83e58955, user_data=0x0) at driver.c:80 #1 0xfef2aa5c in log_dest_driver_deinit_method (s=0x80d2958) at driver.c:80 #2 0xfe914a14 in afsocket_dd_deinit (s=0x80d2958) at afsocket.c:109 #3 0xfef29cd6 in log_dest_group_deinit (s=0x8074198) at dgroup.c:59 #4 0xfef24a21 in log_center_deinit (self=0x8090cc0) at center.c:67 #5 0xfef25493 in cfg_deinit (cfg=0x80d2848) at cfg.c:90 #6 0xfef406b2 in main_loop_reload_config_apply () at mainloop.c:364 #7 0xfef40a42 in main_loop_io_worker_sync_call (func=<value optimized out>) at mainloop.c:364 #8 0x08047900 in ?? ()
I was able to reproduce this over and over again. Sometimes it happens after 11 HUPs, sometimes after 35, but it eventually does crash. Since the only difference between box A and box B is the one additional destination I suspect that this is the cause for this segfault. Maybe the HUP signal came at a time when syslog-ng was trying to send something to the non-existing destination?
I'll try to collect some more data. If anybody could give me some direction on where exactly I should look into, I'll be happy to do that.
- Michael
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
-- Bazsi
participants (2)
-
Balazs Scheidler
-
Michael Hocke