remote logging not reliable

28 Feb 2006

      guys

I've been pulling my hair out here. I gonna go insane....

Here's the story.

I first setup syslog-ng-1.6.6 for remote syslogging a year ago.

I seemed to work out great  - all my stuff appeared to be logged remotely
and everything was dandy.
Then I decided to come up with some sort of check that would tell me if a
machine was not logging remotely anymore
cause of network problems or whatever. I have a perl logchecker which I
basically told to check for the cron.hourly log message
and if it finds it to grep for the source host and add this host to a file.
I was then gonna check this list against a pre-defined array
of hosts that should be logging blablaba....
This is how I discovered that not everything is really logged. All my
clients have this conf file:

options { chain_hostnames(0);
          time_reopen(10);
          time_reap(360);
          log_fifo_size(2048);
          sync(1);
          stats(0);
          create_dirs(yes);
          owner(root);
          group(root);
          perm(0644);
          dir_perm(0755);
          use_dns(yes);
          dns_cache(yes); };

source src { internal();
             unix-stream("/dev/log");
             file("/proc/kmsg"); };

destination messages { file("/var/log/messages"); };
destination auth { file("/var/log/auth"); };
destination cron { file("/var/log/cron"); };
destination mail { file("/var/log/mail"); };
destination loghost { tcp("**********" port(5000)); };

filter auth { facility(auth, authpriv); };
filter cron { facility(cron); };
filter mail { facility(mail); };
filter nagios { not ( match("Accepted publickey for nagios from ********")
or
                      match("COMMAND=/usr/local/nagios/home/check_duplex")
or
                      match("session opened for user nagios") or
                      match("session closed for user nagios")); };

log { source(src); filter(cron); destination(cron); destination(loghost);
flags(final); };
log { source(src); filter(mail); destination(mail); destination(loghost);
flags(final); };
log { source(src); filter(auth); filter(nagios); destination(auth);
destination(loghost); flags(final); };
log { source(src); filter(nagios); destination(messages);
destination(loghost); };

If I check the cronlog locally on a client I can see that all the hourly
logs are there:

Feb 27 12:01:01 sbeta crond[31964]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 13:01:01 sbeta crond[9232]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 14:01:01 sbeta crond[18969]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 15:01:01 sbeta crond[28630]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 16:01:01 sbeta crond[5311]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 17:01:01 sbeta crond[14977]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 18:01:01 sbeta crond[22335]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 19:01:01 sbeta crond[31995]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 20:01:01 sbeta crond[9336]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 21:01:01 sbeta crond[19002]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 22:01:01 sbeta crond[28663]: (root) CMD (run-parts /etc/cron.hourly)
Feb 27 23:01:01 sbeta crond[5928]: (root) CMD (run-parts /etc/cron.hourly)
Feb 28 00:01:01 sbeta crond[15495]: (root) CMD (run-parts /etc/cron.hourly)
Feb 28 01:01:01 sbeta crond[25160]: (root) CMD (run-parts /etc/cron.hourly)
Feb 28 02:01:01 sbeta crond[2394]: (root) CMD (run-parts /etc/cron.hourly)
Feb 28 03:01:01 sbeta crond[12087]: (root) CMD (run-parts /etc/cron.hourly)
Feb 28 04:01:01 sbeta crond[21753]: (root) CMD (run-parts /etc/cron.hourly)
Feb 28 05:01:01 sbeta crond[31573]: (root) CMD (run-parts /etc/cron.hourly)
Feb 28 06:01:01 sbeta crond[8840]: (root) CMD (run-parts /etc/cron.hourly)
Feb 28 07:01:01 sbeta crond[18506]: (root) CMD (run-parts /etc/cron.hourly)
Feb 28 08:01:01 sbeta crond[28166]: (root) CMD (run-parts /etc/cron.hourly)
Feb 28 09:01:01 sbeta crond[5433]: (root) CMD (run-parts /etc/cron.hourly)

However, checking the remote cron logs for that machine for that whole of
February shows
this:

Feb 21 12:01:01 sbeta crond[23754]: (root) CMD (run-parts /etc/cron.hourly)
Feb 24 05:01:01 sbeta crond[5012]: (root) CMD (run-parts /etc/cron.hourly)
Feb 24 17:01:01 sbeta crond[26525]: (root) CMD (run-parts /etc/cron.hourly)
Feb 25 03:01:01 sbeta crond[28212]: (root) CMD (run-parts /etc/cron.hourly)
Feb 25 13:01:01 sbeta crond[30890]: (root) CMD (run-parts /etc/cron.hourly)
Feb 26 01:01:01 sbeta crond[20620]: (root) CMD (run-parts /etc/cron.hourly)
Feb 16 19:01:01 sbeta crond[6299]: (root) CMD (run-parts /etc/cron.hourly)
Feb 17 00:01:01 sbeta crond[22138]: (root) CMD (run-parts /etc/cron.hourly)
Feb 17 04:01:01 sbeta crond[28292]: (root) CMD (run-parts /etc/cron.hourly)
Feb 17 08:01:01 sbeta crond[1917]: (root) CMD (run-parts /etc/cron.hourly)
Feb 17 12:01:01 sbeta crond[7812]: (root) CMD (run-parts /etc/cron.hourly)
Feb 17 16:01:01 sbeta crond[14136]: (root) CMD (run-parts /etc/cron.hourly)
Feb 17 19:01:01 sbeta crond[10669]: (root) CMD (run-parts /etc/cron.hourly)
Feb 18 00:01:01 sbeta crond[26489]: (root) CMD (run-parts /etc/cron.hourly)
Feb 18 04:01:01 sbeta crond[32742]: (root) CMD (run-parts /etc/cron.hourly)
Feb 18 08:01:01 sbeta crond[6852]: (root) CMD (run-parts /etc/cron.hourly)
Feb 18 12:01:01 sbeta crond[13106]: (root) CMD (run-parts /etc/cron.hourly)
Feb 18 15:01:01 sbeta crond[9768]: (root) CMD (run-parts /etc/cron.hourly)
Feb 18 20:01:01 sbeta crond[25588]: (root) CMD (run-parts /etc/cron.hourly)
Feb 12 08:01:01 sbeta crond[22554]: (root) CMD (run-parts /etc/cron.hourly)
Feb 12 12:01:01 sbeta crond[28808]: (root) CMD (run-parts /etc/cron.hourly)
Feb 12 16:01:01 sbeta crond[2735]: (root) CMD (run-parts /etc/cron.hourly)
Feb 12 20:01:01 sbeta crond[9063]: (root) CMD (run-parts /etc/cron.hourly)
Feb 13 00:01:01 sbeta crond[15268]: (root) CMD (run-parts /etc/cron.hourly)
Feb 14 17:01:01 sbeta crond[22845]: (root) CMD (run-parts /etc/cron.hourly)
Feb 14 21:01:02 sbeta crond[29177]: (root) CMD (run-parts /etc/cron.hourly)
Feb 15 22:01:01 sbeta crond[12945]: (root) CMD (run-parts /etc/cron.hourly)
Feb 16 05:01:01 sbeta crond[17917]: (root) CMD (run-parts /etc/cron.hourly)
Feb 16 16:01:01 sbeta crond[29758]: (root) CMD (run-parts /etc/cron.hourly)
Feb 16 18:01:01 sbeta crond[16687]: (root) CMD (run-parts /etc/cron.hourly)

As you can see it seemingly randomly logs this stuff now and then. So this
machine hasn't been
sending logs remotely since Feb 21. ? As I said all clients run the same
version of syslog-ng and the
exact same config - yet some of then seem to log cron stuff every hour and
others don't.
I get a log ot those 'Error connecting to remote host AF_INET' messages BUT
i get those for EVERY host so
why is there no consistency ?
I know what everbody is gonna this - your network is hosed but here is the
problem with that theory.

I can log on to that box and I will immediately see this on the loghost in
/var/log/auth

Feb 28 10:43:06 sbeta sshd[21815]: Accepted publickey for stucky from
******** port 47427 ssh2
Feb 28 10:43:06 sbeta sshd(pam_unix)[21819]: session opened for user stucky
by (uid=0)

This will work every time for all other machines as well. I'd understand if
it didn't log at all or with a slight delay
like say 1 minute. Besides it appears cron stuff really gets lost but then
why doesn't any of the auth/authpriv
stuff get lost ? I can't help feeling it has something to do with logging
cron but looking at the config I clearly told
it to log to the local file first and then again to the remote host just
like auth/authpriv stuff.

Here is my logservers config:

options { chain_hostnames(0);
          time_reopen(10);
          time_reap(360);
          log_fifo_size(2048);
          use_fqdn(no);
          use_dns(yes);
          dns_cache(yes);
          keep_hostname(yes);
          long_hostnames(off);
          sync(1);
          stats(0);
          create_dirs(yes);
          perm(0640);
          dir_perm(0750);
        };

source src { internal();
             unix-stream("/dev/log");
             file("/proc/kmsg");
             udp (ip("*********"));
             tcp (ip("*********")
                  port(5000)
                  max-connections(1000)
                  keep-alive(yes)); };

destination messages { file("/var/log/messages"); };
destination auth     { file("/var/log/auth"); };
destination cron     { file("/var/log/cron"); };
destination mail     { file("/var/log/mail"); };
destination arch     {
file("/usr/local/var/log_archive/$HOST/$YEAR/$MONTH/$DAY/archive"); };

filter auth { facility(auth, authpriv); };
filter cron { facility(cron); };
filter mail { facility(mail); };
filter messages { not facility(auth, authpriv, cron, mail); };
filter nagios { not ( match("Accepted publickey for nagios from *******") or
                      match("COMMAND=/usr/local/nagios/home/check_duplex")
or
                      match("session opened for user nagios") or
                      match("session closed for user nagios")); };
filter junk { not ( match("Accepted publickey for oracle from ***********")
or
                    match("Accepted publickey for oracle from ***********")
or
                    match("Accepted publickey for stucky from ***********")
or
                    match("COMMAND=/usr/local/nagios/home/check_duplex") or
                    match(".+reconnecting to LDAP server.+sleeping") or
                    match("session opened for user root") or
                    match(".+AF_INET client connected from .+") or
                    match(".+AF_INET client dropped connection from.+") or
                    match(".+Connection broken to AF_INET.+")); };

log { source(src); filter(cron); destination(cron); destination(arch);
flags(final); };
log { source(src); filter(mail); destination(mail); destination(arch);
flags(final); };
log { source(src); filter(auth); filter(nagios); destination(auth);
destination(arch); flags(final); };
log { source(src); filter(junk); filter(nagios); destination(messages);
destination(arch); };

And here's the really messed up thing. EVERY time I set up a tcpdump on the
loghost at the full hour
to see whether the box actually sends stuff to it - it does ! Yet, if I wait
again for a while and check the logs
I can see again that a log of hourly cronlogs are skipped.
Can anybody please take a look at my configs and at least tell me that they
are not completely wrong ?
I'm running out of ideas.
--
stucky

stucky

Nate Campi

stucky

Balazs Scheidler

stucky

John Morrissey

Balazs Scheidler

John Morrissey

tags

participants (4)