[syslog-ng] Message loss (probably) within syslog-ng

Vincent Haverlant vincent at haverlant.org
Sun Mar 5 19:45:56 CET 2006


Hi,

I get a kind of message loss trouble like in some previous message with 
the subject "remote logging not reliable", but in his case, the remote 
logging was done other tcp and this was pointed as the probable cause of 
message loss. In my case only udp is involved.

Description of the infra:
About 2500 unix hosts sending logs via their original syslog daemon 
(Solaris or RedHat).
They are set up that way:
=====
kern.info                   @logserver
*.error;user.none      @logserver
auth.info                  @logserver
=====

The central syslog-ng (1.9.9) server is a solaris 8 host configured that 
way:
=====

options {
  time_reopen (1);
  time_reap(600);
  stats_freq(60);
  log_fifo_size (25000000);
  keep_hostname (yes);
  long_hostnames (no);
  use_dns (yes);
  dns_cache (yes);
  dns_cache_size(3000);
  use_fqdn (no); # utilisation du nom court de la machine
  owner("root"); # Logs owner
  group("sys"); # Logs group owner
  perm(0755);
  dir_owner("sysexplo"); # Directory Owner
  dir_group("sys"); # Directpry Group
  dir_perm(0775); # Directory Perm
  create_dirs (yes);
  use_time_recvd(yes);
#  gc_idle_threshold(1000);
#  gc_busy_threshold(100000);
};

#
# Configuration directives for remote logs
#
source lan {
  udp (port(514));
};


destination hostfiles {

file("/projets/SYS/sysexplo/syslogdata/$YEAR$MONTH$DAY.logremote/$HOST"
                owner("sysexplo")
                group("sys")
                perm(0755)
                template("$ISODATE $HOST $MSG\n")
                );
};

log {
        source(lan);
        destination(hostfiles);
        flags(final);
};

#
#local logs
#
source localmsg {
        sun-stream("/dev/log" door("/etc/.syslog_door"));
        internal();
};
destination syslog {
        file("/var/log/syslog");
};
destination authlog {
        file("/var/adm/authlog");
};
destination messages {
        file("/var/adm/messages");
};

# filters to mimic traditional Solaris logging
filter f_mail {
        facility(mail);
};
filter f_auth {
        level(info) and facility(auth, authpriv);
};
filter f_not_mail {
        not facility(mail);
};
log {
        source(localmsg);
        filter(f_auth);
        destination(authlog);
};
log {
        source(localmsg);
        filter(f_mail);
        destination(syslog);
};
log {
        source(localmsg);
        filter(f_not_mail);
        destination(messages);
};

log {
        source(localmsg);
        destination(hostfiles);
        flags(final);
}; # Also save logs from local host
#
=====

This generates between 3 to 8 GiB logs per day on the logserver.

Everything went fine until we wanted to check a specific event on a 
specific host which generates something like 10 lines of auth.info logs 
for which only 5 lines were saved on the log server. It appeared that 
dome packets were randomly ignored. But not sure if it was a network 
issue (udp) or syslog-ng we produced the following tool to dump what 
arrives on the udp socket:

====

syslog-dump.c
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <sys/errno.h>

#define BUFFLEN 2047

int main (int argc, char **argv) {
  int syslog_socket;
  int syslog_port=514;
  struct sockaddr_in syslog_addr;
  unsigned char syslog_buffer[BUFFLEN+1];
  int len,ret;

  /* create socket */
  syslog_socket=socket(PF_INET, SOCK_DGRAM, 0);
  if (syslog_socket==-1) {
    printf("Error creating socket");
    exit(1);
  }
  bzero(&syslog_addr,sizeof(syslog_addr));
  syslog_addr.sin_family = AF_INET;
  syslog_addr.sin_addr.s_addr = htonl(INADDR_ANY);
  syslog_addr.sin_port = htons(syslog_port);
  ret=bind(syslog_socket, (struct sockaddr *) &syslog_addr,
sizeof(syslog_addr));
  if (ret==-1) {
    perror("Error binding to socket");
    exit(1);
  }

  while (len=read(syslog_socket, syslog_buffer, BUFFLEN)){
    printf("%s\n",syslog_buffer);
    bzero(syslog_buffer, BUFFLEN+1);
  }
}
====

We run this instead of syslog-ng for a few minutes and produces som 20000 lines of log on 2 hosts in 20 seconds using logger. During the tests I also kept receiving the normal messages from my 2500 hosts in addition to my test messages.

I tested it for more than half an hour and never lost any message using my syslog-dump program (something like 20 consecutive tests). The next half hour I repeated the test with syslog-ng. Unfortunately I lost around 10% to 20% of my test messages every time.

I started with syslog-ng 1.6.9 but upgraded to 1.9.9 to be sure I had all the improvements. Has anybody an idea of where my packets get lost, is there a tunning only solution, should I start looking at the code to find my lost packets ? I'd appreciate a small explanation of syslog-ng internal if that is the case :)

Aside from that I would like to say that I'm quite happy with syslog-ng features.

Regards,
Vincent.







More information about the syslog-ng mailing list