lost messages with follow_freq()?
Hi, I'm trying to use the follow_freq() option to tail a growing log file, but not all of the messages are making it from the source end to the destination end. However, according to the statistics, no messages are dropped. I am using syslog-ng 2.0.9. The file on the source side looks like this: -rw-r--r-- 1 jshaw ita 251M Aug 19 15:07 dump-file and is growing rather rapidly. (It is basically being created by replaying another stored log file.) At this same point on the destination side, this is the corresponding file: -rw-r--r-- 1 root root 30M Aug 19 15:07 syslog-messages And doing a diff on those files does show large missing chunks from this file. The source's statistics say that nothing has been dropped: Aug 19 15:07:45 source-host syslog-ng[18574]: Log statistics; dropped='tcp(AF_INET(10.1.73.18:2000))=0', processed='center(queued)=303967', processed='center(received)=304017', processed='destination(d_file)=6', processed='destination(d_remote)=303961', processed='source(s_sys)=56', processed='source(s_internal)=6', processed='source(s_file)=303955' And on the destination side: Aug 19 15:08:04 dest-host syslog-ng[21023]: Log statistics; processed='center(queued)=318524', processed='center(received)=318521', processed='destination(d_file)=318521', processed='destination(d_stats)=3', processed='source(s_tcp)=318518', processed='source(s_internal)=3' The source syslog-ng.conf file looks like this: options { sync(0); time_reopen(10); log_fifo_size(1000); long_hostnames(off); use_dns(yes); dns_cache(yes); use_fqdn(no); keep_hostname(yes); use_time_recvd(no); log_msg_size(65536); stats_freq(180); }; source s_internal { internal(); }; source s_sys { file ("/proc/kmsg" log_prefix("kernel: ")); unix-stream ("/dev/log"); }; source s_file { file("/ita/dump-file" follow_freq(1) flags(no-parse)); }; # Local destination for statistics destination d_file { file("/ita/syslog-messages" perm(0644)); }; # Remote destination destination d_remote { tcp("dest-host" port(2000)); }; # Send stats locally log { source(s_internal); destination(d_file); }; # Send everything remotely log { source(s_internal); source(s_file); destination(d_remote); }; (Yes, the s_sys source is unused.) On the destination side: options { sync(0); time_reopen(10); log_fifo_size(1000); long_hostnames(off); use_dns(yes); dns_cache(yes); use_fqdn(no); keep_hostname(yes); use_time_recvd(no); log_msg_size(65536); stats_freq(180); }; # Remote source source s_tcp { tcp(port(2000) log-fetch-limit(128) max-connections(1000)); }; # syslog-ng statistics source s_internal { internal(); }; destination d_file { file("/ita/syslog-messages" perm(0644) log_fifo_size(100)); }; destination d_stats { file("/ita/syslog-stats" perm(0644)); }; # Save stats separately log { source(s_internal); destination(d_stats); }; # Take all remote data and save it locally log { source(s_tcp); source(s_internal); destination(d_file); }; Any ideas what might be going on, or how to analyze this further? Thanks, Joe
Hi again, A quick followup to my last post: On Tue, Aug 19, 2008 at 3:21 PM, Joe Shaw <joe@joeshaw.org> wrote:
I'm trying to use the follow_freq() option to tail a growing log file, but not all of the messages are making it from the source end to the destination end. However, according to the statistics, no messages are dropped. I am using syslog-ng 2.0.9.
I changed follow_freq() in the source to be measured in milliseconds rather than full seconds and set the frequency to 50ms. The outcome is considerably better, but still missing some data: [jshaw@source-host /ita]$ wc -l dump-file 9982973 dump-file versus: [jshaw@dest-host /ita]$ wc -l syslog-messages 9263853 syslog-messages syslog-messages is actually bigger than dump-file now, but that's because of the syslog headers prepended to the lines. The number of lines above is still short. I'll try lowering the interval even further and seeing if it gets closer. Joe
On Tue, 2008-08-19 at 16:02 -0400, Joe Shaw wrote:
Hi again,
A quick followup to my last post:
On Tue, Aug 19, 2008 at 3:21 PM, Joe Shaw <joe@joeshaw.org> wrote:
I'm trying to use the follow_freq() option to tail a growing log file, but not all of the messages are making it from the source end to the destination end. However, according to the statistics, no messages are dropped. I am using syslog-ng 2.0.9.
I changed follow_freq() in the source to be measured in milliseconds rather than full seconds and set the frequency to 50ms. The outcome is considerably better, but still missing some data:
[jshaw@source-host /ita]$ wc -l dump-file 9982973 dump-file
versus:
[jshaw@dest-host /ita]$ wc -l syslog-messages 9263853 syslog-messages
syslog-messages is actually bigger than dump-file now, but that's because of the syslog headers prepended to the lines. The number of lines above is still short.
I'll try lowering the interval even further and seeing if it gets closer.
The problem is probably related to the fifo size on the destination file. file related statistics are not reported (as they'd clutter the log statistics message). so please increase log_fifo_size() on your destination file. -- Bazsi
Hi, On Wed, Aug 20, 2008 at 12:39 PM, Balazs Scheidler <bazsi@balabit.hu> wrote:
The problem is probably related to the fifo size on the destination file. file related statistics are not reported (as they'd clutter the log statistics message).
so please increase log_fifo_size() on your destination file.
Quickly poking at the code, in that case I should expect to see a "Destination queue full, dropping message" in the log file if debugging were enabled, correct? I don't see such a message there. Joe
Hi again, On Wed, Aug 20, 2008 at 12:39 PM, Balazs Scheidler <bazsi@balabit.hu> wrote:
The problem is probably related to the fifo size on the destination file. file related statistics are not reported (as they'd clutter the log statistics message).
so please increase log_fifo_size() on your destination file.
You were right, I was dropping messages on the destination side and increasing the log_fifo_size() there stopped the "dropped messages" warnings. However, it didn't solve the problem. The file on the destination size is still considerably smaller, and it doesn't explain why making the follow_freq() window smaller (from 1s to 50ms) improves the results considerably. Thanks, Joe
On Wed, 2008-08-20 at 15:14 -0400, Joe Shaw wrote:
Hi again,
On Wed, Aug 20, 2008 at 12:39 PM, Balazs Scheidler <bazsi@balabit.hu> wrote:
The problem is probably related to the fifo size on the destination file. file related statistics are not reported (as they'd clutter the log statistics message).
so please increase log_fifo_size() on your destination file.
You were right, I was dropping messages on the destination side and increasing the log_fifo_size() there stopped the "dropped messages" warnings. However, it didn't solve the problem. The file on the destination size is still considerably smaller, and it doesn't explain why making the follow_freq() window smaller (from 1s to 50ms) improves the results considerably.
Hmm.. are you sure that the given file destination is the only one where message loss can occur? what is your exact scenario? something like this: source s_file { file("/var/log/file-to-be-tailed" follow-freq(1)); }; destination d_file { file("/var/log/destination-file"); }; log { source(s_file); destination(d_file); }; If this is similar to your exact scenario, then only the fifo of the destination file can get full. Decreasing the follow frequency will only increase your load and not solve the problem. The source side cannot really drop messages, not even in cases when more time elapses. -- Bazsi
Hi, On Thu, Aug 21, 2008 at 8:28 AM, Balazs Scheidler <bazsi@balabit.hu> wrote:
Hmm.. are you sure that the given file destination is the only one where message loss can occur?
Yes, but you were right again. My problem this time is simply that I wasn't patient enough. My tool which replays the logs runs in about 20 minutes but it takes considerably longer for that data to make it to the centralized server. This makes sense: with a follow_freq() of 1 and reading up to a 1000 messages from the file, it'll take quite a while to chew through some of the very long log files. Thanks for your help with this. I've attached a patch which enables the dropped messages counter for destination files if the verbose flag is set. I think this would be helpful if you are in a situation like me and had no idea that messages were being dropped on the destination file end. Joe
participants (2)
-
Balazs Scheidler
-
Joe Shaw