syslog-ng 3.3.1 quits at reload
-----BEGIN PGP SIGNED MESSAGE----- Hi there, I am having a problem with syslog-ng 3.3.1. Once in a while syslog-ng quits with a QUIT signal after it was asked to reload the configuration through a HUP signal. I have two instances of it running in separate Solaris 10 zones. Both are extremely busy and both crashed once within the last two weeks. By looking at the code in lib/mainloop.c I found the place where syslog-ng quits. It is within main_loop_reload_config_apply caused by a corrupt old configuration. Here is the backtrace with dbx: - ------< SNIP >------- (dbx) where current thread: t@1 =>[1] _kill(0x347f, 0x3, 0x0, 0xfef557d2, 0x80ef41c, 0xfed6217c), at 0xfeb7af85 [2] main_loop_reload_config_apply(0x8047920, 0xfefad000, 0x8047908, 0xfef707f9, 0x80ef368, 0xfef92934, 0x0, 0x0, 0xfef706c9), at 0xfef70595 [3] main_loop_io_worker_sync_call(0x0, 0x8047920, 0x0, 0x0, 0x0, 0x0), at 0xfef70742 [4] iv_signal_event(0xfefad000, 0x8047960, 0x400, 0x796c746e, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfef8d958 [5] iv_event_raw_got_event(0xfefad018, 0x8047db0, 0xfe9, 0xfefaa468, 0x2, 0x1), at 0xfef8d54a [6] iv_main(0xfefad180, 0xfef9291e, 0x804c2e8, 0x0, 0x804c7b0, 0x8047e1c), at 0xfef8ac82 [7] main_loop_run(0x8049cd7, 0x8047e44, 0x8047e48, 0x8047e2c, 0x8047e30, 0x8049390), at 0xfef71259 [8] main(0x1, 0x8047e60, 0x8047e70), at 0x80490fa - ------< SNIP >------- I used to run 3.0.6 before without a problem. The configuration of the server that crashed last night is pretty vanilla: - ------< SNIP >------- @version: 3.3 @include "scl.conf" options { time_reopen (10); chain_hostnames (no); use_dns (no); use_fqdn (no); create_dirs (no); keep_hostname (yes); perm (0644); }; # syslog-ng internal logging source s_local { internal (); }; # local syslogging only source s_sys { system (); }; destination d_sysmsg { file ("/dev/sysmsg"); }; destination d_local { file ("/var/log/syslog-ng.log"); }; destination d_messages { file ("/var/adm/messages"); }; destination d_mail { file ("/var/log/mail.log"); }; destination d_daemon { file ("/var/log/daemon.log"); }; destination d_auth { file ("/var/log/auth.log"); }; destination d_cron { file ("/var/log/cron.log"); }; destination d_user { file ("/var/log/user.log"); }; destination d_lpr { file ("/var/log/lpr.log"); }; destination d_news { file ("/var/log/news.log"); }; destination d_uucp { file ("/var/log/uucp.log"); }; destination d_local0 { file ("/var/log/ipfilter.log"); }; destination d_local1 { file ("/var/log/tcpwrapper.log"); }; destination d_local2 { file ("/var/log/tripwire.log"); }; destination d_local3 { file ("/var/log/local3.log"); }; destination d_local4 { file ("/var/log/local4.log"); }; destination d_local5 { file ("/var/log/local5.log"); }; destination d_local6 { file ("/var/log/local6.log"); }; destination d_local7 { file ("/var/log/local7.log"); }; destination d_snmptraps { file ("/usr/local/var/log/snmptrapd.${R_YEAR}${R_MONTH}${R_DAY}${R_HOUR}.log" group ("snmptrap") perm (416)); }; # this syslog config send a copy of every message to a Splunk server destination d_archive { udp ("x.x.x.x"); }; filter f_sysmsg { level (notice) and facility (kern); }; filter f_messages { facility (kern); }; filter f_mail { facility (mail); }; filter f_daemon { facility (daemon); }; filter f_auth { facility (auth); }; filter f_cron { facility (cron); }; filter f_user { facility (user); }; filter f_lpr { facility (lpr); }; filter f_news { facility (news); }; filter f_uucp { facility (uucp); }; filter f_local0 { facility (local0); }; filter f_local1 { facility (local1); }; filter f_local2 { facility (local2); }; filter f_local3 { facility (local3); }; filter f_local4 { facility (local4); }; filter f_local5 { facility (local5); }; filter f_local6 { facility (local6); }; filter f_local7 { facility (local7); }; log { source (s_local); destination (d_local); }; log { source (s_sys); filter (f_sysmsg); destination (d_sysmsg); destination (d_archive); }; log { source (s_sys); filter (f_messages); destination (d_messages); destination (d_archive); }; log { source (s_sys); filter (f_mail); destination (d_mail); destination (d_archive); }; log { source (s_sys); filter (f_daemon); destination (d_daemon); destination (d_archive); }; log { source (s_sys); filter (f_auth); destination (d_auth); destination (d_archive); }; log { source (s_sys); filter (f_cron); destination (d_cron); destination (d_archive); }; log { source (s_sys); filter (f_user); destination (d_snmptraps); }; log { source (s_sys); filter (f_lpr); destination (d_lpr); destination (d_archive); }; log { source (s_sys); filter (f_news); destination (d_news); destination (d_archive); }; log { source (s_sys); filter (f_uucp); destination (d_uucp); destination (d_archive); }; log { source (s_sys); filter (f_user); destination (d_snmptraps); }; log { source (s_sys); filter (f_lpr); destination (d_lpr); destination (d_archive); }; log { source (s_sys); filter (f_news); destination (d_news); destination (d_archive); }; log { source (s_sys); filter (f_uucp); destination (d_uucp); destination (d_archive); }; log { source (s_sys); filter (f_local0); destination (d_local0); destination (d_archive); }; log { source (s_sys); filter (f_local1); destination (d_local1); destination (d_archive); }; log { source (s_sys); filter (f_local2); destination (d_local2); destination (d_archive); }; log { source (s_sys); filter (f_local3); destination (d_local3); destination (d_archive); }; log { source (s_sys); filter (f_local4); destination (d_local4); destination (d_archive); }; log { source (s_sys); filter (f_local5); destination (d_local5); destination (d_archive); }; log { source (s_sys); filter (f_local6); destination (d_local6); destination (d_archive); }; log { source (s_sys); filter (f_local7); destination (d_local7); destination (d_archive); }; - ------< SNIP >------- Anybody else having this problem? Any hope we can nail this down? - - Michael -----BEGIN PGP SIGNATURE----- Version: PGP Desktop 10.0.3 (Build 1) Charset: us-ascii wsBVAwUBTrmaT5bfnpCg64TVAQE3JAf+KCFVzUL/MGWTgMqRLowtYRDhT8NtThnL p4bRqDqggXjJR8eQhY8dRKyi7zSN+T59WnGunCA2WD98hX5+bUWtyR3yVs5GH4KK XplVrpoPzJNOQFnPrjnrg0VGahO83XMO0nIcS5HdgId8hsGgKKknkWosDTFl97J+ IPxQeqoVuj92DT9MSm00z6SLrX9XtD4dK762zcQ2Tsvz9j9KW21jiWo9sUor6Lpe Dh4Mle0V/dGJOBi7j7UiQOSQCRikl20zTZtId7wsFqsDVGSFlOKTvW+l3XLok0KU ufVnI46N8mrnUWFqDU5SQ952gelIsIAdj6UUwKioZQMafeZvvRLwFA== =9NxF -----END PGP SIGNATURE-----
On Tue, 2011-11-08 at 16:08 -0500, Michael Hocke wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hi there,
I am having a problem with syslog-ng 3.3.1. Once in a while syslog-ng quits with a QUIT signal after it was asked to reload the configuration through a HUP signal. I have two instances of it running in separate Solaris 10 zones. Both are extremely busy and both crashed once within the last two weeks. By looking at the code in lib/mainloop.c I found the place where syslog-ng quits. It is within main_loop_reload_config_apply caused by a corrupt old configuration. Here is the backtrace with dbx:
- ------< SNIP >------- (dbx) where current thread: t@1 =>[1] _kill(0x347f, 0x3, 0x0, 0xfef557d2, 0x80ef41c, 0xfed6217c), at 0xfeb7af85 [2] main_loop_reload_config_apply(0x8047920, 0xfefad000, 0x8047908, 0xfef707f9, 0x80ef368, 0xfef92934, 0x0, 0x0, 0xfef706c9), at 0xfef70595 [3] main_loop_io_worker_sync_call(0x0, 0x8047920, 0x0, 0x0, 0x0, 0x0), at 0xfef70742 [4] iv_signal_event(0xfefad000, 0x8047960, 0x400, 0x796c746e, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfef8d958 [5] iv_event_raw_got_event(0xfefad018, 0x8047db0, 0xfe9, 0xfefaa468, 0x2, 0x1), at 0xfef8d54a [6] iv_main(0xfefad180, 0xfef9291e, 0x804c2e8, 0x0, 0x804c7b0, 0x8047e1c), at 0xfef8ac82 [7] main_loop_run(0x8049cd7, 0x8047e44, 0x8047e48, 0x8047e2c, 0x8047e30, 0x8049390), at 0xfef71259 [8] main(0x1, 0x8047e60, 0x8047e70), at 0x80490fa - ------< SNIP >-------
Hmm... syslog-ng only sends itself a SIGQUIT if the reload goes really bad. At reload time, syslog-ng tries to: * deinit the old configuration (close stuff, etc) * init the new configuration (open stuff, etc) If the init of the 2nd configuration fails, it tries to revert back to the old by initializing that again. And this fails. In this case syslog-ng has no configuration to operate on (e.g. has no clue where tto listen for messages and where to write them) and kills itself. The issue is worse as it has no place to write this information as a troubleshooting aid if running in the background, as it has no console at this point. Perhaps you could run syslog-ng in the foreground, redirect all internal logs to stderr using the -e switch and see what causes trouble? I don't see why the config can't be initialized, it seems to be pretty basic. Perhaps one of the device files you operate on (/dev/sysmsg) can't be opened in some cases. Maybe a bug causes syslog-ng to keep that file opened and trying it open it again? (I've checked this case, and syslog-ng seems to be closing the file descriptor just fine). Maybe it's related to /dev/log, which again may not be opened multiple times. -- Bazsi
-----BEGIN PGP SIGNED MESSAGE----- On Nov 9, 2011, at 3:52 PM, Balazs Scheidler wrote:
On Tue, 2011-11-08 at 16:08 -0500, Michael Hocke wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hi there,
I am having a problem with syslog-ng 3.3.1. Once in a while syslog-ng quits with a QUIT signal after it was asked to reload the configuration through a HUP signal. I have two instances of it running in separate Solaris 10 zones. Both are extremely busy and both crashed once within the last two weeks. By looking at the code in lib/mainloop.c I found the place where syslog-ng quits. It is within main_loop_reload_config_apply caused by a corrupt old configuration. [ ... ]
[ ... ] The issue is worse as it has no place to write this information as a troubleshooting aid if running in the background, as it has no console at this point. Perhaps you could run syslog-ng in the foreground, redirect all internal logs to stderr using the -e switch and see what causes trouble?
I figured out how to reproduce this error. It happens after sending 16 HUPs to the process. This seems to be the magic number. This is the error output I am getting: - ------> SNIP <------ Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_rad_ap,d_rad_ap#0)' Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_dhcpd,d_dhcpd#0)' Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_dhcpd,d_dhcpd#0)' Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_dhcpd,d_dhcpd#0)' Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_dhcpd,d_dhcpd#0)' Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_dhcpd,d_dhcpd#0)' Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_dhcpd,d_dhcpd#0)' Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_rad_proxy,d_rad_proxy#0)' Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_radius,d_radius#0)' Error opening syslog device; filename='/dev/log', error='No such device or address (6)' Error initializing source driver; source='s_sys', id='s_sys#0' Error initializing message pipeline; Error initializing new configuration, reverting to old config; Multiple internal() sources were detected, this is not possible; Error initializing source driver; source='s_local', id='s_local#0' Error initializing message pipeline; - -------> SNIP <------- Could this be a side effect of the (now fixed) bug concerning the "duplicate configuration elements"? - - Michael -----BEGIN PGP SIGNATURE----- Version: PGP Desktop 10.0.3 (Build 1) Charset: us-ascii wsBVAwUBTtPzq5bfnpCg64TVAQHdLgf/Qbg9xNp8/n53ro46R5a0yiAnErOEtC6n 9EpWUSqxP1kfQmwQezQrboJkWJRjz90x4Z4S01uAJlZM8XXJY1EstlzR1xHHoLSc db8waq+khj11frERvM75DY5Yj/sXRhSCjL7Vzpe4CBSpwg9RI9nkw0tjJ2AXWbOJ S8YDk90iffWt5u0D9vSYOFnMxYe8DmKCpyMi6zQ13pPtj4hNFE38JSCuDBsDONZm +z2N9eaoIX68EsZeFeTEXnJNRLlOES8FX88lSIFpi5LORkshlZbOGsGxbflAWew5 mWfcGJ1XATlNvpYlC7f1ewGQx3WFm3tfQU7ryT5LCNik9H+fKFRBuQ== =P6UJ -----END PGP SIGNATURE-----
On 11/28/2011 12:48 PM, Michael Hocke wrote:
I figured out how to reproduce this error. It happens after sending 16 HUPs to the process. This seems to be the magic number. This is the error output I am getting:
------> SNIP <------ Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_rad_ap,d_rad_ap#0)'
Could this be a side effect of the (now fixed) bug concerning the "duplicate configuration elements"?
- Michael
This bug has been identified and fixed as of the current 3.3.3 version of syslog-ng. -Dave
-----BEGIN PGP SIGNED MESSAGE----- On Nov 28, 2011, at 4:08 PM, Dave Rawks wrote:
On 11/28/2011 12:48 PM, Michael Hocke wrote:
I figured out how to reproduce this error. It happens after sending 16 HUPs to the process. This seems to be the magic number. This is the error output I am getting:
------> SNIP <------ Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_rad_ap,d_rad_ap#0)'
Could this be a side effect of the (now fixed) bug concerning the "duplicate configuration elements"?
- Michael
This bug has been identified and fixed as of the current 3.3.3 version of syslog-ng.
I just upgraded to 3.3.3 and I am still experiencing syslog-ng quitting on me after 16 HUPs: - --------> SNIP <--------- # /usr/local/sbin/syslog-ng --foreground -e Syslog connection established; fd='9', server='AF_INET(128.122.253.15:514)', local='AF_INET(0.0.0.0:0)' syslog-ng starting up; version='3.3.3' Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Configuration reload request received, reloading configuration; Error opening syslog device; filename='/dev/log', error='No such device or address (6)' Error initializing source driver; source='s_sys', id='s_sys#0' Error initializing message pipeline; Error initializing new configuration, reverting to old config; Multiple internal() sources were detected, this is not possible; Error initializing source driver; source='s_local', id='s_local#0' Error initializing message pipeline; Quit (core dumped) - --------> SNIP <--------- I tried running it in verbose, debug, and trace mode but I couldn't syslog-ng make me tell more why it is having problems. Should I file a bug report? - - Michael -----BEGIN PGP SIGNATURE----- Version: PGP Desktop 10.0.3 (Build 1) Charset: us-ascii wsBVAwUBTtUampbfnpCg64TVAQHkUQgAvFtwXL4CEVyVCdBR4JIPvucn97dLBoKT Il2z8tm2je9Qhh3pLMXkhlbwnbs0iIrzsl232thxzsDU471R6tL5MSzig9YIjOjf BabROgRQ1J821WNaZI9UFGqvhtMsvIaTaFucZnl99vZ7NNOjyOSI+j5O/WBRrtpr AizuMU4LpBOKa1f4VoI4cXx+pZNTxoMxQeWHbjhG04iH+hplltRIbnZOwLkhXhb3 EdLBvV3zvP/Wc7UH9iU+koljaYIT9khWkCgQug6R08g9+bPmxuinhCgpXOMgTdfa w9x0A9bbvwCueuMAuuzmoucRYRHiJAQnEx2Zgn5Txpy/dfUlgLG1Tw== =Zdsx -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- On Nov 29, 2011, at 12:47 PM, Michael Hocke wrote:
I just upgraded to 3.3.3 and I am still experiencing syslog-ng quitting on me after 16 HUPs: [ ... ]
Sorry for following up on my own posting. I just checked the OpenSolaris sources and it is definitely something specific to Solaris. The /dev/log device can only be cloned up to LOG_NUMCLONES times which is defined as 16 in <sys/log.h>. Every open call on /dev/log clones the device and since it seems that /dev/log is not closed when a HUP is received the number of clones accumulate until after the 16th HUP signal it tries to execute open64("/dev/log", O_RDONLY|O_NONBLOCK|O_NOCTTY) which results in an ENXIO error. Can this be fixed easily? - - Michael -----BEGIN PGP SIGNATURE----- Version: PGP Desktop 10.0.3 (Build 1) Charset: us-ascii wsBVAwUBTtUgIZbfnpCg64TVAQEufgf+NDFPW2oHlmvT33CQCysBWiVFuWRjDv4n 4cHaT6H1bkxHf1D+PgGbNmgW0VUbQSS583BhsJf0JUkllc7NYn4DeE/5EwV+cd9G LiRQWP9L3+RmXhmwxyx/STZiC3G54a2qStPZVyDBJkne8ixkggG86ddVQAYZsjMl 4Acb7psRfotrPFmAY+LGw5qQ7Qr6UW7TtrbyPTrBK2nvaMouUOfljnpduc4zn75M 6aECDDJEwsNBO8wSqvC6LmDyUkUOCO6yGlnXpPcb3anO9YvgxF5JAABkgRvuO1fl HH9Bx+Z5dKVz//Zp1Bh8H16lCldaxI41+DjJHTd8U9OgIE6yzM7rFQ== =ICpP -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- On Nov 29, 2011, at 1:10 PM, Michael Hocke wrote:
Sorry for following up on my own posting. I just checked the OpenSolaris sources and it is definitely something specific to Solaris. The /dev/log device can only be cloned up to LOG_NUMCLONES times which is defined as 16 in <sys/log.h>. Every open call on /dev/log clones the device and since it seems that /dev/log is not closed when a HUP is received the number of clones accumulate until after the 16th HUP signal it tries to execute open64("/dev/log", O_RDONLY|O_NONBLOCK|O_NOCTTY) which results in an ENXIO error.
I put in a quick and dirty fix for the problem I am seeing. I made sure that the /dev/log device is being closed in afstreams_sd_deinit(). I am very sure this is not the right place and it should probably be closer to LogTransport but that would probably require some extra flags and methods since there is no log_transport_deinit() and a closing of the fd for all kinds of transports is probably not desired. Anyway, here is the "fix" I put in place: # diff afstreams.c afstreams.c.orig 37d36 < gint log_fd; 166a166
gint fd; 173,174c173,174 < self->log_fd = open(self->dev_filename->str, O_RDONLY | O_NOCTTY | O_NONBLOCK); < if (self->log_fd != -1)
fd = open(self->dev_filename->str, O_RDONLY | O_NOCTTY | O_NONBLOCK); if (fd != -1) 178c178 < g_fd_set_cloexec(self->log_fd, TRUE);
g_fd_set_cloexec(fd, TRUE);
181c181 < if (ioctl(self->log_fd, I_STR, &ioc) < 0) - ---
if (ioctl(fd, I_STR, &ioc) < 0)
187c187 < close(self->log_fd); - ---
close(fd);
190,191c190,191 < g_fd_set_nonblock(self->log_fd, TRUE); < self->reader = log_reader_new(log_proto_dgram_server_new(log_transport_streams_new(self->log_fd), self->reader_options.msg_size, 0)); - ---
g_fd_set_nonblock(fd, TRUE); self->reader = log_reader_new(log_proto_dgram_server_new(log_transport_streams_new(fd), self->reader_options.msg_size, 0));
207c207 < evt_tag_int("fd", self->log_fd), - ---
evt_tag_int("fd", fd),
211c211 < close(self->log_fd); - ---
close(fd);
239,240d238 < if (self->log_fd != -1) < close (self->log_fd); I pretty much store the fd of the log device in AFStreamsSourceDriver and use that in afstreams_sd_deinit(). - - Michael -----BEGIN PGP SIGNATURE----- Version: PGP Desktop 10.0.3 (Build 1) Charset: us-ascii wsBVAwUBTtaQVJbfnpCg64TVAQE8IAf/cj4FJ5rtlnyb6nnmYQrJikE7h4toAD4m BARVTOce1LrmyU827jjbkg6bUqlO5dZwD02mGOoN260I6HLW5bcWYLaevvJKAWZH Y1/LaDTA9XDnOZZkM2YMqWi+yHYftsRB1rTDNcyCvwFr/LiYd2HVVBqHGTEssZed zBM1xUvqrESk20Gqw1VUh10SFvZd2yNn4uxdrC5Pz8RpYjxPFiaEiMF7VDHDti23 xS4tCbSj2PzzrtQykYjh6HQSkm6o0A0155d9b4MMHWT84rRTxjuNgve7DzfLRvCv oJ3fc8u+lQLe2Akps37h+apRXYWvSvarPk26hlzKqp1plNZ8R+OmYQ== =qLWQ -----END PGP SIGNATURE-----
On Wed, 2011-11-30 at 15:21 -0500, Michael Hocke wrote:
On Nov 29, 2011, at 1:10 PM, Michael Hocke wrote:
Sorry for following up on my own posting. I just checked the OpenSolaris sources and it is definitely something specific to Solaris. The /dev/log device can only be cloned up to LOG_NUMCLONES times which is defined as 16 in <sys/log.h>. Every open call on /dev/log clones the device and since it seems that /dev/log is not closed when a HUP is received the number of clones accumulate until after the 16th HUP signal it tries to execute open64("/dev/log", O_RDONLY|O_NONBLOCK|O_NOCTTY) which results in an ENXIO error.
I put in a quick and dirty fix for the problem I am seeing. I made sure that the /dev/log device is being closed in afstreams_sd_deinit(). I am very sure this is not the right place and it should probably be closer to LogTransport but that would probably require some extra flags and methods since there is no log_transport_deinit() and a closing of the fd for all kinds of transports is probably not desired. Anyway, here is the "fix" I put in place:
# diff afstreams.c afstreams.c.orig 37d36 < gint log_fd; 166a166
gint fd; 173,174c173,174 < self->log_fd = open(self->dev_filename->str, O_RDONLY | O_NOCTTY | O_NONBLOCK); < if (self->log_fd != -1)
fd = open(self->dev_filename->str, O_RDONLY | O_NOCTTY | O_NONBLOCK); if (fd != -1) 178c178 < g_fd_set_cloexec(self->log_fd, TRUE);
g_fd_set_cloexec(fd, TRUE);
181c181 < if (ioctl(self->log_fd, I_STR, &ioc) < 0) ---
if (ioctl(fd, I_STR, &ioc) < 0)
187c187 < close(self->log_fd); ---
close(fd);
190,191c190,191 < g_fd_set_nonblock(self->log_fd, TRUE); < self->reader = log_reader_new(log_proto_dgram_server_new(log_transport_streams_new(self->log_fd), self->reader_options.msg_size, 0)); ---
g_fd_set_nonblock(fd, TRUE); self->reader = log_reader_new(log_proto_dgram_server_new(log_transport_streams_new(fd), self->reader_options.msg_size, 0));
207c207 < evt_tag_int("fd", self->log_fd), ---
evt_tag_int("fd", fd),
211c211 < close(self->log_fd); ---
close(fd);
239,240d238 < if (self->log_fd != -1) < close (self->log_fd);
I pretty much store the fd of the log device in AFStreamsSourceDriver and use that in afstreams_sd_deinit().
Hi, Checking out the code, the fd leak can only happen in case the LogTransport instance doesn't get freed. LogTransport is freed by LogProto and that by LogReader and that by AFStreamsSourceDriver. e.g. the structure is AFStreamsSourceDriver->reader->proto->transport So at the same time of the fdleak, this seems to be a memory leak too. Just by the looks of it, the reader instance is deinited properly in afstreams_sd_deinit() and then unrefed in afstreams_sd_free(). Can you perhaps check if log_pipe_unref() call in afstreams_sd_free() is invoked? and then once there, can you also check if the ref count of the reader actually goes down to zero and log_reader_free() is also invoked? Looking the code further, the culprit seems to be the LogReader->control member, which holds a reference to the source driver, e.g. there's a circular reference between the source driver and the reader. This causes the neither source driver nor the reader to be freed, which should explain the fd leak. I have to go now, but I'll think a bit more about the issue. -- Bazsi
participants (3)
-
Balazs Scheidler
-
Dave Rawks
-
Michael Hocke