[syslog-ng]performance test questions

Roberto Nibali ratz@tac.ch
Thu, 30 Jan 2003 17:12:56 +0100


This is a multi-part message in MIME format.
--------------090206010404090800090204
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

I'm conducting syslog-ng performance tests. I've deployed a central loghost 
environment and three clients trying to send messages which should end up on the 
central loghost with as few losses as possible. The whole thing is currently 
done over UDP and the clients have either syslogd or syslog-ng as daemons.

If I deploy syslog-ng as a client to send messages from /dev/log via UDP to the 
central loghost I experience a rather high message loss starting on a threshold 
of about 1000 messages sent in the least amount of time possible. If I do the 
same thing with syslogd the threshold is up to 35000 from the same machine. That 
means I can send 35000 messages to /dev/log as fast as possible and they still 
get reliably transferred to the central loghost and written to disk there.

A little stunned over this strange occurrance I've started to narrow the problem 
down and I ended up with following question: Why does syslog-ng not send a UDP 
packet per message sent via syslog() and an ending '\n'? And is there a mean to 
configure this odd behaviour? To illustrate what I mean I provide you with two 
sample dumps from a tcpdump session where I sent 3 messages. The first one is 
the tcpdump snipped of syslogd processing the messages and the second one is the 
result when syslog-ng processes the logs:

16:01:09.210645 x.y.z.6.514 > x.y.z.117.514: udp 71
0x0000	 4500 0063 ca2c 0000 4011 87aa ac1b e806	E..c.,..@.......
0x0010	 ac1b e875 0202 0202 004f 3a51 3c31 313e	...u.....O:Q<11>
0x0020	 6c6f 6774 6573 7420 3a20 5b31 5d20 4920	logtest.:.[1].I.
0x0030	 6c69 6b65 2074 6f20 6c6f 6720 6661 7374	like.to.log.fast
0x0040	 2062 7574 206d 6f73 7420 696d 706f 7274	.but.most.import
0x0050	 616e 746c 7920 6163 6375 7261 7465 2121	antly.accurate!!
0x0060	 2120 0a                                	!..
16:01:09.210820 x.y.z6.514 > x.y.z.117.514: udp 71
0x0000	 4500 0063 ca2d 0000 4011 87a9 ac1b e806	E..c.-..@.......
0x0010	 ac1b e875 0202 0202 004f 3a50 3c31 313e	...u.....O:P<11>
0x0020	 6c6f 6774 6573 7420 3a20 5b32 5d20 4920	logtest.:.[2].I.
0x0030	 6c69 6b65 2074 6f20 6c6f 6720 6661 7374	like.to.log.fast
0x0040	 2062 7574 206d 6f73 7420 696d 706f 7274	.but.most.import
0x0050	 616e 746c 7920 6163 6375 7261 7465 2121	antly.accurate!!
0x0060	 2120 0a                                	!..
16:01:09.211064 x.y.z.6.514 > x.y.z.117.514: udp 71
0x0000	 4500 0063 ca2e 0000 4011 87a8 ac1b e806	E..c....@.......
0x0010	 ac1b e875 0202 0202 004f 3a4f 3c31 313e	...u.....O:O<11>
0x0020	 6c6f 6774 6573 7420 3a20 5b33 5d20 4920	logtest.:.[3].I.
0x0030	 6c69 6b65 2074 6f20 6c6f 6720 6661 7374	like.to.log.fast
0x0040	 2062 7574 206d 6f73 7420 696d 706f 7274	.but.most.import
0x0050	 616e 746c 7920 6163 6375 7261 7465 2121	antly.accurate!!
0x0060	 2120 0a                                	!..
------
16:01:53.458412 x.y.z.6.1031 > x.y.z.117.514: udp 306
0x0000	 4500 014e ca6c 0000 4011 867f ac1b e806	E..N.l..@.......
0x0010	 ac1b e875 0407 0202 013a b8e8 3c31 313e	...u.....:..<11>
0x0020	 4a61 6e20 3330 2031 373a 3137 3a31 3920	Jan.30.17:17:19.
0x0030	 735f 696e 7440 6d69 6c6b 2d6e 6574 5f6f	s_int@milk-net_o
0x0040	 7665 7266 6c6f 7720 3a20 5b31 5d20 4920	verflow.:.[1].I.
0x0050	 6c69 6b65 2074 6f20 6c6f 6720 6661 7374	like.to.log.fast
0x0060	 2062 7574 206d 6f73 7420 696d 706f 7274	.but.most.import
0x0070	 616e 746c 7920 6163 6375 7261 7465 2121	antly.accurate!!
0x0080	 210a 3c31 313e 4a61 6e20 3330 2031 373a	!.<11>Jan.30.17:
0x0090	 3137 3a31 3920 735f 696e 7440 6d69 6c6b	17:19.s_int@milk
0x00a0	 2d6e 6574 5f6f 7665 7266 6c6f 7720 3a20	-net_overflow.:.
0x00b0	 5b32 5d20 4920 6c69 6b65 2074 6f20 6c6f	[2].I.like.to.lo
0x00c0	 6720 6661 7374 2062 7574 206d 6f73 7420	g.fast.but.most.
0x00d0	 696d 706f 7274 616e 746c 7920 6163 6375	importantly.accu
0x00e0	 7261 7465 2121 210a 3c31 313e 4a61 6e20	rate!!!.<11>Jan.
0x00f0	 3330 2031 373a 3137 3a31 3920 735f 696e	30.17:17:19.s_in
0x0100	 7440 6d69 6c6b 2d6e 6574 5f6f 7665 7266	t@milk-net_overf
0x0110	 6c6f 7720 3a20 5b33 5d20 4920 6c69 6b65	low.:.[3].I.like
0x0120	 2074 6f20 6c6f 6720 6661 7374 2062 7574	.to.log.fast.but
0x0130	 206d 6f73 7420 696d 706f 7274 616e 746c	.most.importantl
0x0140	 7920 6163 6375 7261 7465 2121 210a     	y.accurate!!!.
------

The configuration file for syslogd is:
---------------------------------------------------------
user.err	@x.y.z.117
---------------------------------------------------------

The configuration file for syslog-ng is:
---------------------------------------------------------
options {
         sync (0);
         time_reopen (10);
         log_fifo_size (1024);
         long_hostnames (on);
         use_dns (no);
         use_fqdn (no);
         create_dirs (yes);
         keep_hostname (yes);
};

source s_int {
         internal();
         unix-stream("/dev/log" max-connections(256));
};

filter f_test {
         facility(user) and level(err);
};

destination d_test {
         udp("x.y.z.117" port(514));
};

log {
         source(s_int);
         filter(f_test);
         destination(d_test);
         flags(final);
};
---------------------------------------------------------

Please enlighten me with an explanation to this problem. As an example take 
following verified numbers for a syslog() storm of 50000 messages where I do a 
50ms sleep every 250 messages sent. All in all this results in a 14332ms log 
burst (usleep() syscall overhead).

Message lines lost with syslog-ng (average): ~6300
Message lines lost with syslogd   (average): 0 !!!!

I've also attached the test tool which I use to conduct my tests. It's extremely 
simple so I don't expect any problems with this. It has sequence numbers so one 
is able to track down the losses (something which should be in syslog-ng too IMHO).

I currently have no explanation for this observation because I haven't read the 
whole source code yet. I suspect you have a read buffer on sources (/dev/log in 
this example) which overrides the '\n' -> fflush() semantics to be able to send 
less UDP packets with more content per packet. So I hope this is tunable 
somewhere or one can even disable it to mimic the old syslogd behavior.

Another question: Why is there no message sent when a buffer reaches its maximum 
size? This would help debugging and tuning a lot IMHO. Sorry if this has been 
discussed before.

Best regards,
Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

--------------090206010404090800090204
Content-Type: text/plain;
 name="logtest.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="logtest.c"

#include <stdio.h>
#include <stdlib.h>
#include <syslog.h>
#include <unistd.h>
#include <sys/time.h>

/*
	logtest.c: RTSL!
	
	Copyright 2003, Roberto Nibali, terreActive AG
	Published under the GPLv2 license.
*/

#define DEF_STRING	"I like to log fast but most importantly accurate!!!"
#define DEF_IDENT	"logtest "
#define DEF_FACILITY	LOG_KERN
#define DEF_PRIO	LOG_ERR
#define DEF_NUM		10
#define DEF_INT		-1
#define DEF_MOD		1

void flood_log(const char *message, 
               unsigned int num,
               unsigned int seq,
	       long int interval,
	       unsigned int mod) {
	int i;

	for (i=1; i<=num; i++, seq++) {
		syslog(DEF_PRIO, "[%d] %s\n", seq, message);
		if (interval != DEF_INT && ((seq % mod) == 0)) {
			usleep(interval*1000);
		}
	}
}

int main(int argc, char *argv[]) {
	unsigned int seq = 1;
	unsigned int num = DEF_NUM;
	long int interval= DEF_INT;
	unsigned int mod = DEF_MOD;
	struct timeval start;
	struct timeval end;
	unsigned long diff;

	if (argc < 2) {
		printf("\n");
		printf("Usage: %s [<num>][<int>{ms}][<mod>]\n\n", argv[0]);
		printf("<num>: Amount of log messages\n");
		printf("<int>: Sleep time in miliseconds\n");
		printf("<mod>: Every <mod> do sleep for <int> miliseconds\n\n");
		printf("Example: %s 1000 700 250\n", argv[0]);
		printf("--------\n");
		printf("This will send 1000 messages and do a 0.7s delay ");
		printf("for every 250 messages sent.\n");
		printf("It should approximately take 2.8sec plus syscall ");
		printf("overhead which depends on the\n");
		printf("CPU, thus it effectively takes roughly 3s.\n");
		printf("\n");
		exit(1);
	} else if (argc == 2) {
		num=atoi(argv[1]);
	} else if (argc == 3) {
		num=atoi(argv[1]);
		interval=strtoul(argv[2], NULL, 10);
	} else if (argc == 4) {
		num=atoi(argv[1]);
		interval=strtoul(argv[2], NULL, 10);
		mod=atoi(argv[3]);
	}

	printf("Estimated time: %ld msecs\n",num*(interval==-1?1:interval)/mod);
	openlog(DEF_IDENT, 0, DEF_FACILITY);
	gettimeofday(&start, NULL);
	flood_log(DEF_STRING, num, seq, interval, mod);
	gettimeofday(&end, NULL);
	closelog();
	diff = (end.tv_sec - start.tv_sec) * 1000000
             + (end.tv_usec - start.tv_usec);
	printf("Message string: %s\n", DEF_STRING); 
	printf("Sent messages : %d\n", num); 
	printf("Elapsed time  : %ld msecs\n", diff/1000);
	printf("Bytes written : %d Bytes\n", num*sizeof(DEF_STRING));
	printf("Interval      : %ld msecs\n", interval == -1 ? 0 : interval);
	printf("Modulus       : %d\n", mod);
	return(0);
}

--------------090206010404090800090204--