Balazs Scheidler wrote:
On Wed, 2008-03-05 at 11:50 -0800, Evan Rempel wrote:
Please bear with me. This is a little involved.
We have a bunch of servers, but I am only focusing on one today. It logs everything to two syslog-ng central servers. A really simple config file
--------- options { sync(0); log_fifo_size(50000); use_fqdn(yes); keep_hostname(no); chain_hostnames(yes); time_reap(60); time_reopen(5); };
source local { unix-stream("/dev/log" max-connections(200)); file("/proc/kmsg" log_prefix("kernel: ")); internal(); };
template standard_file { template("$ISODATE $FULLHOST $FACILITY.$LEVEL $MESSAGE\n"); template_escape(no); };
destination syslog { file("/var/log/syslog.$YEAR$MONTH$DAY.000000" owner("root") group("syslogs") perm(0640) template(standard_file)); };
destination syslogServer1 { tcp("server1" log_fifo_size(50000) ); }; destination syslogServer2 { tcp("server2" log_fifo_size(50000) ); };
log { source(local); destination(syslog); destination(syslogServer1); destination(syslogServer2); }; -----------------
Anyhow. On the two central servers, I different numbers of records in the files, and the statistics on the sender show
2008-03-04T23:26:38-08:00 local@caribou.comp.uvic.ca syslog.info syslog-ng[3391]: Log statistics; dropped='tcp(AF_INET(server2:514))=0', dropped='tcp(AF_INET(server1:514))=14690649', processed='center(queued)=53993217', processed='center(received)=17997739', processed='destination(syslogServer2)=17997739', processed='destination(syslogServer1)=17997739', processed='destination(syslog)=17997739', processed='source(local)=17997739'
The problem is that the files on disk show
caribou 16257954 server1 1742054 server2 965475
and that just doesn't add up. Neither server shows any dropped messages. I know that the statistics miss some time at the beginning of the day and at the end of the day, but the numbers don't even come close. Caribou stats show that no messages are dropped to server2, and lots dropped to server1, however, server2 actually wrote more messages to disk. Server 1 is also about 30% faster than server2.
I am willing to listen to any explanation, but I am beginning to think that the statistics that are logged are wrong.
I know about one possible bug that might explain this: as long as the _first_ connection to a TCP destination is not established, dropped messages are not counted.
E.g. the dropped counter is allocated for a destination when the first connection is established.
So if server2 was down when syslog-ng started and server1 was up, syslog-ng might not count dropped messages towards server2 in the initial period.
Does that sound possible?
It does sound possible, but I can't really confirm this. We have 17 sysadmin making changes to 25 different network segments, firewall etc, and on top of that we just moved one of our two syslog servers to a different new data center with all of the accompanying firewall issues that go with 30 new subnets being rolled out at once :-( All I really wonder now is "in what version of syslog-ng has/was this issue addressed?" Evan. -- Evan Rempel erempel@uvic.ca Senior Programmer Analyst 250.721.7691 Computing Services University of Victoria