lsof shows open files which should be closed
Hello, Scenario: this is a log server which receives logs from variuos hosts via TCP. We use time macros, which create a new log file in every hour for each remote host. The problem is, that 'lsof -c syslog-ng' shows, that the logfiles of the previous hours are still open, which exhausts the number of file descriptors after a while. kill -HUP syslog-ng corrects this as a workaround. System: RHEL4 Syslog-ng: 2.0.5 compiled from source Relevant config parts: options { flush_lines(16); flush_timeout(1000); time_reopen (10); log_fifo_size (1000); use_dns (yes); use_fqdn (no); create_dirs (no); keep_hostname (yes); stats_freq(0); time_sleep(20); time_reap(10); }; source s_net { tcp(max-connections(512)); }; destination d_freemail_pop3{ file("/data/syslog/FP-$YEAR$MONTH$DAY-$HOUR0000-$HOST.log" template ("$YEAR.$MONTH.$DAY $HOUR:$MIN:$SEC $TZOFFSET $HOST $MSG\n") template_escape(no) group("bar") perm(0660)); }; For example 'lsof -c syslog-ng' at 12:20 shows these entries as open (it shows more, there are examples only): syslog-ng 8314 root 10w REG 253,0 1840830 3965325 /data/syslog/TM-20071219-110000-mx08a.log syslog-ng 8314 root 249w REG 253,0 5521445 3965131 /data/bar/data/target/mail/TM-20071219-100000-mx07a.log (deleted) The second is also interesting. After some minutes of an hour change, a script moves the log to an other directory. That's why the path of the second line is changed. But is it still open by syslog-ng? Thanks, Daniel
On Wed, 2007-12-19 at 12:31 +0100, Nagy Daniel wrote:
Hello,
Scenario: this is a log server which receives logs from variuos hosts via TCP. We use time macros, which create a new log file in every hour for each remote host. The problem is, that 'lsof -c syslog-ng' shows, that the logfiles of the previous hours are still open, which exhausts the number of file descriptors after a while. kill -HUP syslog-ng corrects this as a workaround.
System: RHEL4 Syslog-ng: 2.0.5 compiled from source
Relevant config parts: options { flush_lines(16); flush_timeout(1000); time_reopen (10); log_fifo_size (1000); use_dns (yes); use_fqdn (no); create_dirs (no); keep_hostname (yes); stats_freq(0); time_sleep(20); time_reap(10); };
source s_net { tcp(max-connections(512)); };
destination d_freemail_pop3{ file("/data/syslog/FP-$YEAR$MONTH$DAY-$HOUR0000-$HOST.log" template ("$YEAR.$MONTH.$DAY $HOUR:$MIN:$SEC $TZOFFSET $HOST $MSG\n") template_escape(no) group("bar") perm(0660)); };
For example 'lsof -c syslog-ng' at 12:20 shows these entries as open (it shows more, there are examples only): syslog-ng 8314 root 10w REG 253,0 1840830 3965325 /data/syslog/TM-20071219-110000-mx08a.log syslog-ng 8314 root 249w REG 253,0 5521445 3965131 /data/bar/data/target/mail/TM-20071219-100000-mx07a.log (deleted)
The second is also interesting. After some minutes of an hour change, a script moves the log to an other directory. That's why the path of the second line is changed. But is it still open by syslog-ng?
syslog-ng keeps the files opened for a little while after the last message is written to it to avoid open/reopen for each message. This interval can be controlled using the time_reap() option, which defaults to 60sec, if I remember correctly. Decrease time_reap(), or increase the number of file descriptors using "ulimit -n". -- Bazsi
Hello, The current option is time_reap(10), and according to lsof the files are still open after several minutes. Bye, Daniel Balazs Scheidler wrote:
On Wed, 2007-12-19 at 12:31 +0100, Nagy Daniel wrote:
Hello,
Scenario: this is a log server which receives logs from variuos hosts via TCP. We use time macros, which create a new log file in every hour for each remote host. The problem is, that 'lsof -c syslog-ng' shows, that the logfiles of the previous hours are still open, which exhausts the number of file descriptors after a while. kill -HUP syslog-ng corrects this as a workaround.
System: RHEL4 Syslog-ng: 2.0.5 compiled from source
Relevant config parts: options { flush_lines(16); flush_timeout(1000); time_reopen (10); log_fifo_size (1000); use_dns (yes); use_fqdn (no); create_dirs (no); keep_hostname (yes); stats_freq(0); time_sleep(20); time_reap(10); };
source s_net { tcp(max-connections(512)); };
destination d_freemail_pop3{ file("/data/syslog/FP-$YEAR$MONTH$DAY-$HOUR0000-$HOST.log" template ("$YEAR.$MONTH.$DAY $HOUR:$MIN:$SEC $TZOFFSET $HOST $MSG\n") template_escape(no) group("bar") perm(0660)); };
For example 'lsof -c syslog-ng' at 12:20 shows these entries as open (it shows more, there are examples only): syslog-ng 8314 root 10w REG 253,0 1840830 3965325 /data/syslog/TM-20071219-110000-mx08a.log syslog-ng 8314 root 249w REG 253,0 5521445 3965131 /data/bar/data/target/mail/TM-20071219-100000-mx07a.log (deleted)
The second is also interesting. After some minutes of an hour change, a script moves the log to an other directory. That's why the path of the second line is changed. But is it still open by syslog-ng?
syslog-ng keeps the files opened for a little while after the last message is written to it to avoid open/reopen for each message.
This interval can be controlled using the time_reap() option, which defaults to 60sec, if I remember correctly.
Decrease time_reap(), or increase the number of file descriptors using "ulimit -n".
On Thu, 2007-12-20 at 10:38 +0100, Nagy Daniel wrote:
Hello,
The current option is time_reap(10), and according to lsof the files are still open after several minutes.
hm... can you enable "verbose" logging using the -v option? it should display a message like this: msg_verbose("Destination timed out, reaping", evt_tag_str("template", self->filename_template->template->str), evt_tag_str("filename", dw->filename->str), NULL); -- Bazsi
On Thu, 2007-12-20 at 11:57 +0100, Balazs Scheidler wrote:
On Thu, 2007-12-20 at 10:38 +0100, Nagy Daniel wrote:
Hello,
The current option is time_reap(10), and according to lsof the files are still open after several minutes.
hm... can you enable "verbose" logging using the -v option? it should display a message like this:
msg_verbose("Destination timed out, reaping", evt_tag_str("template", self->filename_template->template->str), evt_tag_str("filename", dw->filename->str), NULL);
One additional note: * are your clocks set properly? because some messages may arrive several minutes later which keep the given destination alive. Which macros are you using in the name of the destination file? The ones with an R_ prefix contain the "received" time, the S_ prefix contain the time as received in the message. -- Bazsi
Hi, At 15:00 there are many "Initializing destination file writer..." messages, which is good: Dec 20 15:00:01 s_sys@barapp1 syslog-ng[18883]: Initializing destination file writer; template='/data/syslog/FM-$YEAR$MONTH$DAY-$HOU R0000-$HOST.log', filename='/data/syslog/FM-20071220-150000-fm12.log' But from 15:00 to 15:19 there are no "Destination timed out, reaping" messages at all, and all files opened at 14:00 are still open according to lsof. Every server share a common NTP server, time is good. Bye, Daniel Balazs Scheidler wrote:
On Thu, 2007-12-20 at 11:57 +0100, Balazs Scheidler wrote:
On Thu, 2007-12-20 at 10:38 +0100, Nagy Daniel wrote:
Hello,
The current option is time_reap(10), and according to lsof the files are still open after several minutes.
hm... can you enable "verbose" logging using the -v option? it should display a message like this:
msg_verbose("Destination timed out, reaping", evt_tag_str("template", self->filename_template->template->str), evt_tag_str("filename", dw->filename->str), NULL);
One additional note: * are your clocks set properly? because some messages may arrive several minutes later which keep the given destination alive.
Which macros are you using in the name of the destination file? The ones with an R_ prefix contain the "received" time, the S_ prefix contain the time as received in the message.
One more thing: the man page declares that -v is: "Enable verbose mode. Process will not become a daemon..." This is not true, 2.0.6 runs happily as a daemon with the -v parameter. Dani Nagy Daniel wrote: > Hi, > > At 15:00 there are many "Initializing destination file writer..." > messages, which is good: > > Dec 20 15:00:01 s_sys@barapp1 syslog-ng[18883]: Initializing destination > file writer; template='/data/syslog/FM-$YEAR$MONTH$DAY-$HOU > R0000-$HOST.log', filename='/data/syslog/FM-20071220-150000-fm12.log' > > But from 15:00 to 15:19 there are no "Destination timed out, reaping" > messages at all, and all files opened at 14:00 are still open according > to lsof. > > Every server share a common NTP server, time is good. > > Bye, > > Daniel > > > > Balazs Scheidler wrote: >> On Thu, 2007-12-20 at 11:57 +0100, Balazs Scheidler wrote: >>> On Thu, 2007-12-20 at 10:38 +0100, Nagy Daniel wrote: >>>> Hello, >>>> >>>> The current option is time_reap(10), and according to lsof the files >>>> are still open after several minutes. >>>> >>> hm... can you enable "verbose" logging using the -v option? it should >>> display a message like this: >>> >>> >>> msg_verbose("Destination timed out, reaping", >>> evt_tag_str("template", self->filename_template->template->str), >>> evt_tag_str("filename", dw->filename->str), >>> NULL); >> One additional note: >> * are your clocks set properly? because some messages may arrive >> several minutes later which keep the given destination alive. >> >> Which macros are you using in the name of the destination file? The ones >> with an R_ prefix contain the "received" time, the S_ prefix contain the >> time as received in the message. >> > _______________________________________________ > syslog-ng maillist - syslog-ng@lists.balabit.hu > https://lists.balabit.hu/mailman/listinfo/syslog-ng > Frequently asked questions at http://www.campin.net/syslog-ng/faq.html >
On Thu, 2007-12-20 at 15:22 +0100, Nagy Daniel wrote:
Hi,
At 15:00 there are many "Initializing destination file writer..." messages, which is good:
Dec 20 15:00:01 s_sys@barapp1 syslog-ng[18883]: Initializing destination file writer; template='/data/syslog/FM-$YEAR$MONTH$DAY-$HOU R0000-$HOST.log', filename='/data/syslog/FM-20071220-150000-fm12.log'
But from 15:00 to 15:19 there are no "Destination timed out, reaping" messages at all, and all files opened at 14:00 are still open according to lsof.
Every server share a common NTP server, time is good.
I don't see how syslog-ng could start the timer which reaps files, the only reason that could explain this is the continous stream of incoming messages, which makes this timer starve. The main loop of syslog-ng uses priorities to prioritize incoming events, the "reaper" timer uses a low priority, incoming messages use a higher value. If there's a higher priority event the lower priority is not checked. Can you verify that this patch fixes this issue? Thanks in advance. diff --git a/src/affile.c b/src/affile.c index a0e7cc8..ccc6388 100644 --- a/src/affile.c +++ b/src/affile.c @@ -652,7 +652,7 @@ affile_dd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist) if ((self->flags & AFFILE_NO_EXPAND) == 0) { - self->reap_timer = g_timeout_add_full(G_PRIORITY_LOW, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL); + self->reap_timer = g_timeout_add_full(G_PRIORITY_DEFAULT, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL); self->writer_hash = persist_config_fetch(persist, affile_dd_format_persist_name(self)); if (self->writer_hash) g_hash_table_foreach(self->writer_hash, affile_dd_reuse_writer, self); -- Bazsi
On Thu, 2007-12-20 at 19:42 +0100, Balazs Scheidler wrote:
On Thu, 2007-12-20 at 15:22 +0100, Nagy Daniel wrote:
Hi,
At 15:00 there are many "Initializing destination file writer..." messages, which is good:
Dec 20 15:00:01 s_sys@barapp1 syslog-ng[18883]: Initializing destination file writer; template='/data/syslog/FM-$YEAR$MONTH$DAY-$HOU R0000-$HOST.log', filename='/data/syslog/FM-20071220-150000-fm12.log'
But from 15:00 to 15:19 there are no "Destination timed out, reaping" messages at all, and all files opened at 14:00 are still open according to lsof.
Every server share a common NTP server, time is good.
I don't see how syslog-ng could start the timer which reaps files, the only reason that could explain this is the continous stream of incoming messages, which makes this timer starve.
I mean I could not see how it would not start. E.g. it is always started if there's at least one '$' character in the filename. -- Bazsi
I know nothing of the timer architecture used within syslog-ng, but just to create discussion, could it be that a completed timer is missed? Something like one timer completes and while it is being processed (the file is being closed/reaped) another timer completes which is missed. This would probably only show up on systems that have been configured to log to a large number of files. Evan. Balazs Scheidler wrote:
On Thu, 2007-12-20 at 19:42 +0100, Balazs Scheidler wrote:
On Thu, 2007-12-20 at 15:22 +0100, Nagy Daniel wrote:
Hi,
At 15:00 there are many "Initializing destination file writer..." messages, which is good:
Dec 20 15:00:01 s_sys@barapp1 syslog-ng[18883]: Initializing destination file writer; template='/data/syslog/FM-$YEAR$MONTH$DAY-$HOU R0000-$HOST.log', filename='/data/syslog/FM-20071220-150000-fm12.log'
But from 15:00 to 15:19 there are no "Destination timed out, reaping" messages at all, and all files opened at 14:00 are still open according to lsof.
Every server share a common NTP server, time is good. I don't see how syslog-ng could start the timer which reaps files, the only reason that could explain this is the continous stream of incoming messages, which makes this timer starve.
I mean I could not see how it would not start. E.g. it is always started if there's at least one '$' character in the filename.
-- Evan Rempel erempel@uvic.ca Senior Programmer Analyst 250.721.7691 Computing Services University of Victoria
On Thu, 2007-12-20 at 12:31 -0800, Evan Rempel wrote:
I know nothing of the timer architecture used within syslog-ng, but just to create discussion, could it be that a completed timer is missed? Something like one timer completes and while it is being processed (the file is being closed/reaped) another timer completes which is missed. This would probably only show up on systems that have been configured to log to a large number of files.
No, this is not possible. syslog-ng is single threaded and the timer in glib fires every now and then. The reap timeout specified in the configuration is 10 seconds. I still think that my patch is correct, that's the only possible explanation. But let's wait for the results of the test. -- Bazsi
Hi, Which version this patch is for? It doesn't seem to fit for 2.0.5, 2.0.6 or the latest snapshot either. Daniel Balazs Scheidler wrote:
On Thu, 2007-12-20 at 15:22 +0100, Nagy Daniel wrote:
Hi,
At 15:00 there are many "Initializing destination file writer..." messages, which is good:
Dec 20 15:00:01 s_sys@barapp1 syslog-ng[18883]: Initializing destination file writer; template='/data/syslog/FM-$YEAR$MONTH$DAY-$HOU R0000-$HOST.log', filename='/data/syslog/FM-20071220-150000-fm12.log'
But from 15:00 to 15:19 there are no "Destination timed out, reaping" messages at all, and all files opened at 14:00 are still open according to lsof.
Every server share a common NTP server, time is good.
I don't see how syslog-ng could start the timer which reaps files, the only reason that could explain this is the continous stream of incoming messages, which makes this timer starve.
The main loop of syslog-ng uses priorities to prioritize incoming events, the "reaper" timer uses a low priority, incoming messages use a higher value. If there's a higher priority event the lower priority is not checked.
Can you verify that this patch fixes this issue? Thanks in advance.
diff --git a/src/affile.c b/src/affile.c index a0e7cc8..ccc6388 100644 --- a/src/affile.c +++ b/src/affile.c @@ -652,7 +652,7 @@ affile_dd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
if ((self->flags & AFFILE_NO_EXPAND) == 0) { - self->reap_timer = g_timeout_add_full(G_PRIORITY_LOW, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL); + self->reap_timer = g_timeout_add_full(G_PRIORITY_DEFAULT, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL); self->writer_hash = persist_config_fetch(persist, affile_dd_format_persist_name(self)); if (self->writer_hash) g_hash_table_foreach(self->writer_hash, affile_dd_reuse_writer, self);
Sorry, got it in the latest snapshot. I'll test it now. Daniel Nagy Daniel wrote:
Hi,
Which version this patch is for? It doesn't seem to fit for 2.0.5, 2.0.6 or the latest snapshot either.
Daniel
Balazs Scheidler wrote:
On Thu, 2007-12-20 at 15:22 +0100, Nagy Daniel wrote:
Hi,
At 15:00 there are many "Initializing destination file writer..." messages, which is good:
Dec 20 15:00:01 s_sys@barapp1 syslog-ng[18883]: Initializing destination file writer; template='/data/syslog/FM-$YEAR$MONTH$DAY-$HOU R0000-$HOST.log', filename='/data/syslog/FM-20071220-150000-fm12.log'
But from 15:00 to 15:19 there are no "Destination timed out, reaping" messages at all, and all files opened at 14:00 are still open according to lsof.
Every server share a common NTP server, time is good. I don't see how syslog-ng could start the timer which reaps files, the only reason that could explain this is the continous stream of incoming messages, which makes this timer starve.
The main loop of syslog-ng uses priorities to prioritize incoming events, the "reaper" timer uses a low priority, incoming messages use a higher value. If there's a higher priority event the lower priority is not checked.
Can you verify that this patch fixes this issue? Thanks in advance.
diff --git a/src/affile.c b/src/affile.c index a0e7cc8..ccc6388 100644 --- a/src/affile.c +++ b/src/affile.c @@ -652,7 +652,7 @@ affile_dd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
if ((self->flags & AFFILE_NO_EXPAND) == 0) { - self->reap_timer = g_timeout_add_full(G_PRIORITY_LOW, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL); + self->reap_timer = g_timeout_add_full(G_PRIORITY_DEFAULT, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL); self->writer_hash = persist_config_fetch(persist, affile_dd_format_persist_name(self)); if (self->writer_hash) g_hash_table_foreach(self->writer_hash, affile_dd_reuse_writer, self);
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
Hi, Great, the patch solved it. Now all older files have disappeared from the lsof list at the hour change. I hope this patch will be added in the next release. Koszi! Daniel Balazs Scheidler wrote:
On Thu, 2007-12-20 at 15:22 +0100, Nagy Daniel wrote:
Hi,
At 15:00 there are many "Initializing destination file writer..." messages, which is good:
Dec 20 15:00:01 s_sys@barapp1 syslog-ng[18883]: Initializing destination file writer; template='/data/syslog/FM-$YEAR$MONTH$DAY-$HOU R0000-$HOST.log', filename='/data/syslog/FM-20071220-150000-fm12.log'
But from 15:00 to 15:19 there are no "Destination timed out, reaping" messages at all, and all files opened at 14:00 are still open according to lsof.
Every server share a common NTP server, time is good.
I don't see how syslog-ng could start the timer which reaps files, the only reason that could explain this is the continous stream of incoming messages, which makes this timer starve.
The main loop of syslog-ng uses priorities to prioritize incoming events, the "reaper" timer uses a low priority, incoming messages use a higher value. If there's a higher priority event the lower priority is not checked.
Can you verify that this patch fixes this issue? Thanks in advance.
diff --git a/src/affile.c b/src/affile.c index a0e7cc8..ccc6388 100644 --- a/src/affile.c +++ b/src/affile.c @@ -652,7 +652,7 @@ affile_dd_init(LogPipe *s, GlobalConfig *cfg, PersistentConfig *persist)
if ((self->flags & AFFILE_NO_EXPAND) == 0) { - self->reap_timer = g_timeout_add_full(G_PRIORITY_LOW, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL); + self->reap_timer = g_timeout_add_full(G_PRIORITY_DEFAULT, self->time_reap * 1000 / 2, affile_dd_reap, self, NULL); self->writer_hash = persist_config_fetch(persist, affile_dd_format_persist_name(self)); if (self->writer_hash) g_hash_table_foreach(self->writer_hash, affile_dd_reuse_writer, self);
participants (3)
-
Balazs Scheidler
-
Evan Rempel
-
Nagy Daniel