Hi, I'm conducting syslog-ng performance tests. I've deployed a central loghost environment and three clients trying to send messages which should end up on the central loghost with as few losses as possible. The whole thing is currently done over UDP and the clients have either syslogd or syslog-ng as daemons. If I deploy syslog-ng as a client to send messages from /dev/log via UDP to the central loghost I experience a rather high message loss starting on a threshold of about 1000 messages sent in the least amount of time possible. If I do the same thing with syslogd the threshold is up to 35000 from the same machine. That means I can send 35000 messages to /dev/log as fast as possible and they still get reliably transferred to the central loghost and written to disk there. A little stunned over this strange occurrance I've started to narrow the problem down and I ended up with following question: Why does syslog-ng not send a UDP packet per message sent via syslog() and an ending '\n'? And is there a mean to configure this odd behaviour? To illustrate what I mean I provide you with two sample dumps from a tcpdump session where I sent 3 messages. The first one is the tcpdump snipped of syslogd processing the messages and the second one is the result when syslog-ng processes the logs: 16:01:09.210645 x.y.z.6.514 > x.y.z.117.514: udp 71 0x0000 4500 0063 ca2c 0000 4011 87aa ac1b e806 E..c.,..@....... 0x0010 ac1b e875 0202 0202 004f 3a51 3c31 313e ...u.....O:Q<11> 0x0020 6c6f 6774 6573 7420 3a20 5b31 5d20 4920 logtest.:.[1].I. 0x0030 6c69 6b65 2074 6f20 6c6f 6720 6661 7374 like.to.log.fast 0x0040 2062 7574 206d 6f73 7420 696d 706f 7274 .but.most.import 0x0050 616e 746c 7920 6163 6375 7261 7465 2121 antly.accurate!! 0x0060 2120 0a !.. 16:01:09.210820 x.y.z6.514 > x.y.z.117.514: udp 71 0x0000 4500 0063 ca2d 0000 4011 87a9 ac1b e806 E..c.-..@....... 0x0010 ac1b e875 0202 0202 004f 3a50 3c31 313e ...u.....O:P<11> 0x0020 6c6f 6774 6573 7420 3a20 5b32 5d20 4920 logtest.:.[2].I. 0x0030 6c69 6b65 2074 6f20 6c6f 6720 6661 7374 like.to.log.fast 0x0040 2062 7574 206d 6f73 7420 696d 706f 7274 .but.most.import 0x0050 616e 746c 7920 6163 6375 7261 7465 2121 antly.accurate!! 0x0060 2120 0a !.. 16:01:09.211064 x.y.z.6.514 > x.y.z.117.514: udp 71 0x0000 4500 0063 ca2e 0000 4011 87a8 ac1b e806 E..c....@....... 0x0010 ac1b e875 0202 0202 004f 3a4f 3c31 313e ...u.....O:O<11> 0x0020 6c6f 6774 6573 7420 3a20 5b33 5d20 4920 logtest.:.[3].I. 0x0030 6c69 6b65 2074 6f20 6c6f 6720 6661 7374 like.to.log.fast 0x0040 2062 7574 206d 6f73 7420 696d 706f 7274 .but.most.import 0x0050 616e 746c 7920 6163 6375 7261 7465 2121 antly.accurate!! 0x0060 2120 0a !.. ------ 16:01:53.458412 x.y.z.6.1031 > x.y.z.117.514: udp 306 0x0000 4500 014e ca6c 0000 4011 867f ac1b e806 E..N.l..@....... 0x0010 ac1b e875 0407 0202 013a b8e8 3c31 313e ...u.....:..<11> 0x0020 4a61 6e20 3330 2031 373a 3137 3a31 3920 Jan.30.17:17:19. 0x0030 735f 696e 7440 6d69 6c6b 2d6e 6574 5f6f s_int@milk-net_o 0x0040 7665 7266 6c6f 7720 3a20 5b31 5d20 4920 verflow.:.[1].I. 0x0050 6c69 6b65 2074 6f20 6c6f 6720 6661 7374 like.to.log.fast 0x0060 2062 7574 206d 6f73 7420 696d 706f 7274 .but.most.import 0x0070 616e 746c 7920 6163 6375 7261 7465 2121 antly.accurate!! 0x0080 210a 3c31 313e 4a61 6e20 3330 2031 373a !.<11>Jan.30.17: 0x0090 3137 3a31 3920 735f 696e 7440 6d69 6c6b 17:19.s_int@milk 0x00a0 2d6e 6574 5f6f 7665 7266 6c6f 7720 3a20 -net_overflow.:. 0x00b0 5b32 5d20 4920 6c69 6b65 2074 6f20 6c6f [2].I.like.to.lo 0x00c0 6720 6661 7374 2062 7574 206d 6f73 7420 g.fast.but.most. 0x00d0 696d 706f 7274 616e 746c 7920 6163 6375 importantly.accu 0x00e0 7261 7465 2121 210a 3c31 313e 4a61 6e20 rate!!!.<11>Jan. 0x00f0 3330 2031 373a 3137 3a31 3920 735f 696e 30.17:17:19.s_in 0x0100 7440 6d69 6c6b 2d6e 6574 5f6f 7665 7266 t@milk-net_overf 0x0110 6c6f 7720 3a20 5b33 5d20 4920 6c69 6b65 low.:.[3].I.like 0x0120 2074 6f20 6c6f 6720 6661 7374 2062 7574 .to.log.fast.but 0x0130 206d 6f73 7420 696d 706f 7274 616e 746c .most.importantl 0x0140 7920 6163 6375 7261 7465 2121 210a y.accurate!!!. ------ The configuration file for syslogd is: --------------------------------------------------------- user.err @x.y.z.117 --------------------------------------------------------- The configuration file for syslog-ng is: --------------------------------------------------------- options { sync (0); time_reopen (10); log_fifo_size (1024); long_hostnames (on); use_dns (no); use_fqdn (no); create_dirs (yes); keep_hostname (yes); }; source s_int { internal(); unix-stream("/dev/log" max-connections(256)); }; filter f_test { facility(user) and level(err); }; destination d_test { udp("x.y.z.117" port(514)); }; log { source(s_int); filter(f_test); destination(d_test); flags(final); }; --------------------------------------------------------- Please enlighten me with an explanation to this problem. As an example take following verified numbers for a syslog() storm of 50000 messages where I do a 50ms sleep every 250 messages sent. All in all this results in a 14332ms log burst (usleep() syscall overhead). Message lines lost with syslog-ng (average): ~6300 Message lines lost with syslogd (average): 0 !!!! I've also attached the test tool which I use to conduct my tests. It's extremely simple so I don't expect any problems with this. It has sequence numbers so one is able to track down the losses (something which should be in syslog-ng too IMHO). I currently have no explanation for this observation because I haven't read the whole source code yet. I suspect you have a read buffer on sources (/dev/log in this example) which overrides the '\n' -> fflush() semantics to be able to send less UDP packets with more content per packet. So I hope this is tunable somewhere or one can even disable it to mimic the old syslogd behavior. Another question: Why is there no message sent when a buffer reaches its maximum size? This would help debugging and tuning a lot IMHO. Sorry if this has been discussed before. Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc #include <stdio.h> #include <stdlib.h> #include <syslog.h> #include <unistd.h> #include <sys/time.h> /* logtest.c: RTSL! Copyright 2003, Roberto Nibali, terreActive AG Published under the GPLv2 license. */ #define DEF_STRING "I like to log fast but most importantly accurate!!!" #define DEF_IDENT "logtest " #define DEF_FACILITY LOG_KERN #define DEF_PRIO LOG_ERR #define DEF_NUM 10 #define DEF_INT -1 #define DEF_MOD 1 void flood_log(const char *message, unsigned int num, unsigned int seq, long int interval, unsigned int mod) { int i; for (i=1; i<=num; i++, seq++) { syslog(DEF_PRIO, "[%d] %s\n", seq, message); if (interval != DEF_INT && ((seq % mod) == 0)) { usleep(interval*1000); } } } int main(int argc, char *argv[]) { unsigned int seq = 1; unsigned int num = DEF_NUM; long int interval= DEF_INT; unsigned int mod = DEF_MOD; struct timeval start; struct timeval end; unsigned long diff; if (argc < 2) { printf("\n"); printf("Usage: %s [<num>][<int>{ms}][<mod>]\n\n", argv[0]); printf("<num>: Amount of log messages\n"); printf("<int>: Sleep time in miliseconds\n"); printf("<mod>: Every <mod> do sleep for <int> miliseconds\n\n"); printf("Example: %s 1000 700 250\n", argv[0]); printf("--------\n"); printf("This will send 1000 messages and do a 0.7s delay "); printf("for every 250 messages sent.\n"); printf("It should approximately take 2.8sec plus syscall "); printf("overhead which depends on the\n"); printf("CPU, thus it effectively takes roughly 3s.\n"); printf("\n"); exit(1); } else if (argc == 2) { num=atoi(argv[1]); } else if (argc == 3) { num=atoi(argv[1]); interval=strtoul(argv[2], NULL, 10); } else if (argc == 4) { num=atoi(argv[1]); interval=strtoul(argv[2], NULL, 10); mod=atoi(argv[3]); } printf("Estimated time: %ld msecs\n",num*(interval==-1?1:interval)/mod); openlog(DEF_IDENT, 0, DEF_FACILITY); gettimeofday(&start, NULL); flood_log(DEF_STRING, num, seq, interval, mod); gettimeofday(&end, NULL); closelog(); diff = (end.tv_sec - start.tv_sec) * 1000000 + (end.tv_usec - start.tv_usec); printf("Message string: %s\n", DEF_STRING); printf("Sent messages : %d\n", num); printf("Elapsed time : %ld msecs\n", diff/1000); printf("Bytes written : %d Bytes\n", num*sizeof(DEF_STRING)); printf("Interval : %ld msecs\n", interval == -1 ? 0 : interval); printf("Modulus : %d\n", mod); return(0); }
On Thu, Jan 30, 2003 at 05:12:56PM +0100, Roberto Nibali wrote:
Hi,
I'm conducting syslog-ng performance tests. I've deployed a central loghost environment and three clients trying to send messages which should end up on the central loghost with as few losses as possible. The whole thing is currently done over UDP and the clients have either syslogd or syslog-ng as daemons.
If I deploy syslog-ng as a client to send messages from /dev/log via UDP to the central loghost I experience a rather high message loss starting on a threshold of about 1000 messages sent in the least amount of time possible. If I do the same thing with syslogd the threshold is up to 35000 from the same machine. That means I can send 35000 messages to /dev/log as fast as possible and they still get reliably transferred to the central loghost and written to disk there.
A little stunned over this strange occurrance I've started to narrow the problem down and I ended up with following question: Why does syslog-ng not send a UDP packet per message sent via syslog() and an ending '\n'? And is there a mean to configure this odd behaviour? To illustrate what I mean I provide you with two sample dumps from a tcpdump session where I sent 3 messages. The first one is the tcpdump snipped of syslogd processing the messages and the second one is the result when syslog-ng processes the logs:
Both TCP and UDP use the same code to send messages in 1.5.x and as it seems syslog-ng piggybacks messages together when writing messages. This is correct for TCP but incorrect for UDP. The culprit is libol/src/pkt_buffer.c which uses stream based buffering in the 0.3 branch while it used a packet based one in 0.2 As there are more than one message in a single UDP packet the packet can grow bigger than your MTU, thus packet fragmentation is done. Both can increase message loss (more than one message in a single packet, fragmentation) thus this is definitely a bug. I'll look into fixing this.
Another question: Why is there no message sent when a buffer reaches its maximum size? This would help debugging and tuning a lot IMHO. Sorry if this has been discussed before.
Because this message itself would fill your buffers. Instead syslog-ng reports the number of dropped messages in a separate message sent once every ten minutes (default). -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Both TCP and UDP use the same code to send messages in 1.5.x and as it seems syslog-ng piggybacks messages together when writing messages. This is correct for TCP but incorrect for UDP.
Yes.
The culprit is libol/src/pkt_buffer.c which uses stream based buffering in the 0.3 branch while it used a packet based one in 0.2
I see.
As there are more than one message in a single UDP packet the packet can grow bigger than your MTU, thus packet fragmentation is done.
Exactly, although this should influence the loss too much, IMHO.
Both can increase message loss (more than one message in a single packet, fragmentation) thus this is definitely a bug. I'll look into fixing this.
Thanks.
Another question: Why is there no message sent when a buffer reaches its maximum size? This would help debugging and tuning a lot IMHO. Sorry if this has been discussed before.
Because this message itself would fill your buffers. Instead syslog-ng
Doh (slaps forehead), Of course.
reports the number of dropped messages in a separate message sent once every ten minutes (default).
Call me stupid but (assuming you mean the STATS part): ratz@zar:~/down/log > find . -name "*.[ch]" -exec grep -n dropped_pkts /dev/null {} \; ./libol-0.3.8/src/pkt_buffer.h:30:int pktbuf_dropped_pkts; ./syslog-ng-1.5.25/src/main.c:138: notice("STATS: dropped %i\n", pktbuf_dropped_pkts); ./syslog-ng-1.5.25/src/main.c:139: pktbuf_dropped_pkts = 0; ratz@zar:~/down/log > What else then 0 should this code print? At least I've found the hardcoded timeout in libol/src/io.c. :) Thanks for your time, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
On Thu, Jan 30, 2003 at 07:09:57PM +0100, Roberto Nibali wrote:
reports the number of dropped messages in a separate message sent once every ten minutes (default).
Call me stupid but (assuming you mean the STATS part):
ratz@zar:~/down/log > find . -name "*.[ch]" -exec grep -n dropped_pkts /dev/null {} \; ./libol-0.3.8/src/pkt_buffer.h:30:int pktbuf_dropped_pkts; ./syslog-ng-1.5.25/src/main.c:138: notice("STATS: dropped %i\n", pktbuf_dropped_pkts); ./syslog-ng-1.5.25/src/main.c:139: pktbuf_dropped_pkts = 0; ratz@zar:~/down/log >
What else then 0 should this code print? At least I've found the hardcoded timeout in libol/src/io.c. :)
hmm.. the code to increment that variable _was_ present. ... checked my CVS tree, it was removed between 1.11 and 1.12 revisions of pkt_buffer.c. PACKET_STATS was implemented in 26th August, 2001, but the code to increment the dropped counter was removed in 11th October, 2001. My ChangeLog shows that this was the time when libol was branched to 0.3 and 0.2 branches. I must have made a mistake. I even received reports in the mailing list which shows the STATS message with drops > 0. I've found this mail dated "Date: Fri, 2 Nov 2001 10:00:58 +0100" So the feature must have been there at that time. I wonder how could that happen. ;( -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
hmm.. the code to increment that variable _was_ present. ... checked my CVS tree, it was removed between 1.11 and 1.12 revisions of pkt_buffer.c.
PACKET_STATS was implemented in 26th August, 2001, but the code to increment the dropped counter was removed in 11th October, 2001. My ChangeLog shows that this was the time when libol was branched to 0.3 and 0.2 branches. I must have made a mistake.
Such is life.
I even received reports in the mailing list which shows the STATS message with drops > 0. I've found this mail dated "Date: Fri, 2 Nov 2001 10:00:58 +0100" So the feature must have been there at that time.
I wonder how could that happen. ;(
So to sum things up. Is it possible for 1.5.26 and 0.3.9 to see following things get fixed (speak up if you need help with something, provided I can help)? o remove old dependancy of klogctl from src-tree o add full template patch done by Achim (is waiting for approval from your mailinglist software because sizeof(patch) > 40kb) o add back STATS counter to libol o fix wrong UDP message handling (maybe change back to packet based buffering) So long, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
On Fri, Jan 31, 2003 at 11:28:23AM +0100, Roberto Nibali wrote:
So to sum things up. Is it possible for 1.5.26 and 0.3.9 to see following things get fixed (speak up if you need help with something, provided I can help)?
o remove old dependancy of klogctl from src-tree o add full template patch done by Achim (is waiting for approval from your mailinglist software because sizeof(patch) > 40kb) o add back STATS counter to libol o fix wrong UDP message handling (maybe change back to packet based buffering)
Too late, I've already released 1.5.26 and 0.3.9 though it was not announced yet. Everything you mention above except from the template patch was done. (and I also updated the docs a bit) -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Were these also added/corrected to the 1.9 tree? Richard
-----Original Message----- From: syslog-ng-admin@lists.balabit.hu [mailto:syslog-ng-admin@lists.balabit.hu] On Behalf Of Balazs Scheidler Sent: Friday, January 31, 2003 5:50 AM To: syslog-ng@lists.balabit.hu Subject: Re: [syslog-ng]performance test questions
On Fri, Jan 31, 2003 at 11:28:23AM +0100, Roberto Nibali wrote:
So to sum things up. Is it possible for 1.5.26 and 0.3.9 to see following things get fixed (speak up if you need help with something, provided I can help)?
o remove old dependancy of klogctl from src-tree o add full template patch done by Achim (is waiting for approval from your mailinglist software because sizeof(patch) > 40kb) o add back STATS counter to libol o fix wrong UDP message handling (maybe change back to packet based buffering)
Too late, I've already released 1.5.26 and 0.3.9 though it was not announced yet.
Everything you mention above except from the template patch was done.
(and I also updated the docs a bit)
-- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/sysl> og-ng
Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
On Fri, Jan 31, 2003 at 11:24:20AM -0800, Richard E. Perlotto II wrote:
Were these also added/corrected to the 1.9 tree?
1.9 tree was not affected by this problem. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
First off: I realize that I'm a major bandwith user on this ml currently. That is mainly because I need to do intensive testing which in the past reveiled some problems which I wasn't able to fix by myself. Please tell me if I'm too nerving.
o remove old dependancy of klogctl from src-tree
Verified.
o add back STATS counter to libol
I have a hard time triggering it, what's the best and fastest thing to trigger such a message? I do get message loss at a certain point but it doesn't get accredited within syslog-ng's STATS counter. I assume that this must then be an incoming buffer problem and not a destination buffer problem.
o fix wrong UDP message handling (maybe change back to packet based buffering)
Yes, this is tested and verified. It now works better than with syslogd as a client. Thanks for addressing it.
Too late, I've already released 1.5.26 and 0.3.9 though it was not announced yet.
Everything you mention above except from the template patch was done.
Sorry for being such a pusher but is it possible to have it in 1.5.27? It has been updated and tested extensively by Achim.
(and I also updated the docs a bit)
I've seen it. Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
On Wed, Feb 12, 2003 at 03:02:33PM +0100, Roberto Nibali wrote:
First off: I realize that I'm a major bandwith user on this ml currently. That is mainly because I need to do intensive testing which in the past reveiled some problems which I wasn't able to fix by myself. Please tell me if I'm too nerving.
no, your feedback is appreciated.
o add back STATS counter to libol
I have a hard time triggering it, what's the best and fastest thing to trigger such a message? I do get message loss at a certain point but it doesn't get accredited within syslog-ng's STATS counter. I assume that this must then be an incoming buffer problem and not a destination buffer problem.
If you are using syslog-ng under Linux, try sending messages to a terminal, /dev/tty12 for example, and press Ctrl+S on that terminal. This will block writes and should make syslog-ng drop messages.
Sorry for being such a pusher but is it possible to have it in 1.5.27? It has been updated and tested extensively by Achim.
I've also added this to my CVS tree since then. I have created scripts to generate automatic daily snapshots. They are available at: http://www.balabit.com/downloads/syslog-ng/1.5/src-snapshot/ Can you check if it works for you? I am trying to push 1.6.0 out but I don't want to release a broken version. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Hi,
no, your feedback is appreciated.
Ok, meanwhile I've conducted some tests with syslog-ng-1.5.26 and libol-0.3.9 plus a hand-applied templates patch. I've written a little test tool to send log messages to /dev/log where they will be fetched by syslog[d|-ng] and transported via UDP or in case of syslog-ng over UDP/TCP to a central syslog-ng server. The tests were all done over a 10Mbit/s switched network with no other traffic. The hosts are all running a 2.2.x kernel and tools have been compiled with glibc-2.1.3. I was interested in how many log messages (burst mode, meaning that there is no usleep between sending messages off via syslog()) one can reliably send over the net and store on the disk depending on the level of enabled MACRO expansion. The results during the tests were quite instable so I had to do a lot of repetitions to get a useful mean per test case. I do not claim full correctness of these results but they sure give you an impression on the impact of using UDP vs. TCP or enabling MACRO expansion on different levels. Using 1 client (5000 messages total): -------------- syslogd as a client (UDP transfer) no MACRO : ~5000 messages (100%) #loss encountered but minimal file MACRO : ~4250 messages (83%) file+template MACRO: ~3200 messages (64%) syslog-ng as a client (UDP transfer) no MACRO : ~3500 messages (100% / 70%) file MACRO : ~2800 messages (80% / 66%) file+template MACRO: ~2250 messages (64% / 70%) syslog-ng as a client (TCP transfer) no MACRO : ~5000 messages (100%) file MACRO : ~4300 messages (83%) #between 3400 and 5000 :( file+template MACRO: ~3600 messages (72%) Using 3 clients (5000 messages each, 15000 messages total): --------------- syslogd as a client (UDP transfer) no MACRO : ~5800 messages (39%) file MACRO : ~4000 messages (27%) file+template MACRO: ~3200 messages (21%) syslog-ng as a client (UDP transfer) no MACRO : ~7200 messages (48% / 124%) file MACRO : ~5200 messages (35% / 130%) file+template MACRO: ~3700 messages (25% / 116%) syslog-ng as a client (TCP transfer) no MACRO : ~15000 messages (100%) #always 15000! file MACRO : ~11000 messages (73%) file+template MACRO: ~8800 messages (59%) I can provide you with more information but I do not want to unneccessary cludge this list with useless or even offtopic information. The options I've used on the server side were (actually I tried a lot of different option but all in all the global picture of message loss didn't vary a lot): options { keep_hostname(no); long_hostnames(off); #check_hostname(yes); use_time_recvd(yes); use_dns(no); dns_cache_expire(600); dns_cache_size(10000); dns_cache(yes); use_fqdn(no); create_dirs(yes); sync(0); stats(10); time_reopen (5); log_fifo_size (1000); }; There is a lot to say about those numbers but I'd rather not interprete too much at this point. It seems though that increasing the amount of clients and using UDP as a transfer mechanism definitely worsens the performance, which to a certain degree is logical. Another thing is the rather high performance loss with enabled templates. I understand that in case of templates, more of a message has to be inspected but I'm asking myself if there would be a possibility to add some kind of a caching mechanism? I'm very open to give you more details on the test or to discuss the test conduct or even to repeat it with different settings (provided time permits it).
If you are using syslog-ng under Linux, try sending messages to a terminal, /dev/tty12 for example, and press Ctrl+S on that terminal.
This will block writes and should make syslog-ng drop messages.
Hmm, so only outgoing buffers on blocking destinations get accredited. Seems kind of logic. So for incoming buffers one would need to implement sequence numbers. Have you thought about implementing that?
I've also added this to my CVS tree since then. I have created scripts to generate automatic daily snapshots. They are available at:
http://www.balabit.com/downloads/syslog-ng/1.5/src-snapshot/
Nice, except the hindering shockwave and html stuff around that on the page. Would it be possible to export it to a txt-only directory (for the links/lynx users)? ;)
Can you check if it works for you?
I'll compile and beat it tonight.
I am trying to push 1.6.0 out but I don't want to release a broken version.
As an OSS developer myself I completely understand that. Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
On Thu, Feb 20, 2003 at 04:47:53PM +0100, Roberto Nibali wrote:
Hi,
no, your feedback is appreciated.
Another thing is the rather high performance loss with enabled templates. I understand that in case of templates, more of a message has to be inspected but I'm asking myself if there would be a possibility to add some kind of a caching mechanism?
I'm not really surprised the macro expansion has performance problems. When I wrote that code only 6-8 macros existed, I've checked my table and it contains 51 entries. All looked up sequentially. Very bad performance wise. Attached you'll find a patch which changes this to a faster algorithm. I have also tested it.
http://www.balabit.com/downloads/syslog-ng/1.5/src-snapshot/
Nice, except the hindering shockwave and html stuff around that on the page. Would it be possible to export it to a txt-only directory (for the links/lynx users)? ;)
Our new webmaster wanted to remove the old 'apache directory index' pages specifically. You can grab the tree with wget however.
Can you check if it works for you?
I'll compile and beat it tonight.
The new snapshot to be generated tomorrow at 6AM CET will contain the patch I've attached now. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Hello,
I'm not really surprised the macro expansion has performance problems. When I wrote that code only 6-8 macros existed, I've checked my table and it contains 51 entries. All looked up sequentially. Very bad performance wise.
Indeed.
Attached you'll find a patch which changes this to a faster algorithm. I have also tested it.
I'll give it a spin.
Our new webmaster wanted to remove the old 'apache directory index' pages specifically. You can grab the tree with wget however.
I'll add it to our mirror script.
Can you check if it works for you?
I'll compile and beat it tonight.
I didn't get to test it out yet but I just started now. I hope you haven't released 1.6 yet :)
The new snapshot to be generated tomorrow at 6AM CET will contain the patch I've attached now.
I'm working on with syslog-ng-1.6.0rc1+20030224 and libol-0.3.9 (nothing important seemed to have changed there). I will report back on Wednesday. So far it looks pretty good: syslog-test# date Mon Feb 24 12:59:47 CET 2003 syslog-test# cat /proc/$(pgrep syslog)/status Name: syslogp State: S (sleeping) Pid: 30398 PPid: 1 Uid: 0 0 0 0 Gid: 100 100 100 100 Groups: VmSize: 676 kB VmLck: 0 kB VmRSS: 524 kB VmData: 216 kB VmStk: 8 kB VmExe: 436 kB VmLib: 0 kB SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 8000000000001000 SigCgt: 0000000000014001 CapInh: 0000000000000000 CapPrm: 00000000fffffeff CapEff: 00000000fffffeff syslog-test# My first tests indicate a template speedup of 13% for 3 clients using UDP. I will do more indepth tests on Wednesday morning. Thanks and best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
Hi On Monday 24 February 2003 13:03, Roberto Nibali wrote:
I'm not really surprised the macro expansion has performance problems. When I wrote that code only 6-8 macros existed, I've checked my table and it contains 51 entries. All looked up sequentially. Very bad performance wise.
Indeed.
Attached you'll find a patch which changes this to a faster algorithm. I have also tested it.
I'll give it a spin.
You can also test & try the attached "macro.c" which uses a hash function generated with gperf. In my tests the binary search made the macro expansion (4 macros in the template) about 2.5 times faster and the hash function (and some other minor changes) about 3 times. Achim
On Tue, Feb 25, 2003 at 11:24:38PM +0100, Achim Gsell wrote:
On Monday 24 February 2003 13:03, Roberto Nibali wrote: You can also test & try the attached "macro.c" which uses a hash function generated with gperf. In my tests the binary search made the macro expansion (4 macros in the template) about 2.5 times faster and the hash function (and some other minor changes) about 3 times.
The real solution for further improvements would be to preprocess the template description and cache the results. A template is a sequence of: constant_string macro constant_string macro ... e.g. a single pass would be enough to convert the string to an internal format representing the sequence above, then a much faster algorithm could be used to format the results as no lookups would be needed. (macros could be stored as looked up macro ids) I plan something like this in syslog-ng 2, as the basic infrastructure is there, it would however be a bit more difficult for 1.5.x. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
On Tuesday 25 February 2003 23:36, Balazs Scheidler wrote:
On Tue, Feb 25, 2003 at 11:24:38PM +0100, Achim Gsell wrote:
On Monday 24 February 2003 13:03, Roberto Nibali wrote: You can also test & try the attached "macro.c" which uses a hash function generated with gperf. In my tests the binary search made the macro expansion (4 macros in the template) about 2.5 times faster and the hash function (and some other minor changes) about 3 times.
The real solution for further improvements would be to preprocess the template description and cache the results. A template is a sequence of:
constant_string macro constant_string macro ...
e.g. a single pass would be enough to convert the string to an internal format representing the sequence above, then a much faster algorithm could be used to format the results as no lookups would be needed. (macros could be stored as looked up macro ids)
Yes, of course ...
I plan something like this in syslog-ng 2, as the basic infrastructure is there, it would however be a bit more difficult for 1.5.x.
I had the same idea and this was the reason I didn't implemented it yet! But may be we should use one of the faster algorithms as a fist shot. Achim
Reply-ing to myself:
I'm working on with syslog-ng-1.6.0rc1+20030224 and libol-0.3.9 (nothing important seemed to have changed there). I will report back on Wednesday. So far it looks pretty good:
It looks pretty good to me so far. I've stress tested it a bit by sending 100000000 messages with a total amount of 2015098112 Bytes using a 100ms delay every 500 packets sent.
syslog-test# date Mon Feb 24 12:59:47 CET 2003 syslog-test# cat /proc/$(pgrep syslog)/status Name: syslogp State: S (sleeping) Pid: 30398 PPid: 1 Uid: 0 0 0 0 Gid: 100 100 100 100 Groups: VmSize: 676 kB VmLck: 0 kB VmRSS: 524 kB VmData: 216 kB VmStk: 8 kB VmExe: 436 kB VmLib: 0 kB SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 8000000000001000 SigCgt: 0000000000014001 CapInh: 0000000000000000 CapPrm: 00000000fffffeff CapEff: 00000000fffffeff syslog-test#
After two days of logging and running I get: syslog-test# date Wed Feb 26 10:24:01 CET 2003 syslog-test# cat /proc/$(pgrep syslog)/status Name: syslogp State: S (sleeping) Pid: 30398 PPid: 1 Uid: 0 0 0 0 Gid: 100 100 100 100 Groups: VmSize: 1456 kB VmLck: 0 kB VmRSS: 1308 kB VmData: 992 kB VmStk: 12 kB VmExe: 436 kB VmLib: 0 kB SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 8000000000001000 SigCgt: 0000000000014001 CapInh: 0000000000000000 CapPrm: 00000000fffffeff CapEff: 00000000fffffeff syslog-test#
My first tests indicate a template speedup of 13% for 3 clients using UDP. I will do more indepth tests on Wednesday morning.
This has to wait a little more because of higher priorized work. But from my side with my limited tests syslog-ng-1.6.0rc1 looks pretty good. I think including Achim's speedup gperf hash function is not a bad idea after all (provided there is a speedup) since it is straightforward and other speedup would require a significant change which is probably not what you want for a 1.6.0 release. Thanks and best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
Hi, I also faced a problem with udp message loss and performed some tests of syslog-ng with attemts to reduce them. So here are several suggestions.. First, i noticed that with high udp message rates number of udp packets were dropped due to full socket buffers. This may be checked with netstat's per protocol statistics (netstat -s), Increasing socket rx buffer to 3 MB with setsockopt SO_RCVBUF option seems to help. though it does not completely stops packet drops Second, if resolving is required, either syslog-ng dns cache should be enabled, or local caching name server should be installed to speed up resolving, as slow resolving leads to udp messages loss. I also turned off all usertty destinations because they seemed to slow down syslog-ng, constantly reading /var/run/utmp. Unfortunately i could not get rid of udp packets drops comletely, when peak message rates jumps above 4000 msgs/sec number of packets are being dropped :( Does anybody have ideas about how to reduce loss further? On Thu, Feb 20, 2003 at 04:47:53PM +0100, Roberto Nibali wrote:
Hi,
no, your feedback is appreciated.
Ok, meanwhile I've conducted some tests with syslog-ng-1.5.26 and libol-0.3.9 plus a hand-applied templates patch. I've written a little test tool to send log messages to /dev/log where they will be fetched by syslog[d|-ng] and transported via UDP or in case of syslog-ng over UDP/TCP to a central syslog-ng server. The tests were all done over a 10Mbit/s switched network with no other traffic. The hosts are all running a 2.2.x kernel and tools have been compiled with glibc-2.1.3.
I was interested in how many log messages (burst mode, meaning that there is no usleep between sending messages off via syslog()) one can reliably send over the net and store on the disk depending on the level of enabled MACRO expansion. The results during the tests were quite instable so I had to do a lot of repetitions to get a useful mean per test case.
I do not claim full correctness of these results but they sure give you an impression on the impact of using UDP vs. TCP or enabling MACRO expansion on different levels.
-- Dmitry Frolov, Zenon N.S.P. (095) 250-4629, http://www.zenon.net/
Hello,
I also faced a problem with udp message loss and performed some tests of syslog-ng with attemts to reduce them. So here are several suggestions..
Thanks.
First, i noticed that with high udp message rates number of udp packets were dropped due to full socket buffers.
Sometimes as my tests reveiled so far.
This may be checked with netstat's per protocol statistics (netstat -s), Increasing socket rx buffer to 3 MB with setsockopt SO_RCVBUF option seems to help.
I'll try.
though it does not completely stops packet drops
I think this is impossible anyway ;).
Second, if resolving is required, either syslog-ng dns cache should be enabled, or local caching name server should be installed to speed up resolving, as slow resolving leads to udp messages loss.
Yes, absolutely and I've done that too.
I also turned off all usertty destinations because they seemed to slow down syslog-ng, constantly reading /var/run/utmp.
They are off for my tests.
Unfortunately i could not get rid of udp packets drops comletely, when peak message rates jumps above 4000 msgs/sec number of packets are being dropped :(
That's what I'm seeing roughly but it so dependant on the machine's state as I have seldomly seen before. And simple variations in configuration or test conducts sometimes have a tremendous impact on the test results (almost chaotic behaviour :))
Does anybody have ideas about how to reduce loss further?
Under Linux I tried tweaking some values in /proc/sys/net/core/* but to no avail so far. Also it occurs to me that most of those entries only affect TCP sockets. I'm reading up on the kernel code to see if there is anything else that might help. Another thing of course could be fair scheduling and rate limiting, though I'm not entirely sure if such changes have a significant impact on the actual problem of lost messages. Thanks and best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
New round of testing. This time I've tested: syslog-ng-1.6.0rc1+20030306 libol-0.3.9 macros.c generated by gperf from Achim and Dmitry's suggestion of increasing the SO_RCV buffer. I've simply set the rmem entries in the proc-fs to 512M and this already helped a lot. It looks a _lot_ nicer now:
Using 1 client (5000 messages total): -------------- syslogd as a client (UDP transfer) no MACRO : ~5000 messages (100%) #loss encountered but minimal file MACRO : ~4250 messages (83%) file+template MACRO: ~3200 messages (64%)
syslog-ng as a client (UDP transfer) no MACRO : ~3500 messages (100% / 70%) file MACRO : ~2800 messages (80% / 66%) file+template MACRO: ~2250 messages (64% / 70%)
syslog-ng as a client (TCP transfer) no MACRO : ~5000 messages (100%) file MACRO : ~4300 messages (83%) #between 3400 and 5000 :( file+template MACRO: ~3600 messages (72%)
Using 1 client (5000 messages total): -------------- syslogd as a client (UDP transfer) no MACRO : 5000 messages (100%) #no loss at all file MACRO : ~4979 messages (99%) #loss: packet receive errors file+template MACRO: ~4408 messages (88%) #loss: packet receive errors syslog-ng as a client (UDP transfer) no MACRO : ~3905 messages (78%) \ file MACRO : ~3400 messages (68%) > #loss: packet receive errors!! file+template MACRO: ~3362 messages (67%) / syslog-ng as a client (TCP transfer) no MACRO : 5000 messages (100%) file MACRO : 5000 messages (100%) file+template MACRO: 5000 messages (100%) What strikes me as particularly odd is that syslog-ng as a client in UDP mode still performs worse than syslogd in burst mode. In case you're interested: The 5000 messages are 142Bytes in length and sent in 0.9s without any sleeping between sending. This results in almost exactly 1MByte/s which is close to link saturation (no IRQ or RX queue starvation noticed from NIC driver code) on my 10Mbit/s test network. If I find time I'll run the tests with 3 clients too. But meanwhile I'm all for the inclusion of the gprof'd code since rewriting doesn't look like a feasible option in the current state of development (shortly before stable release). Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
On Thu, Mar 06, 2003 at 04:33:46PM +0100, Roberto Nibali wrote:
Using 1 client (5000 messages total): -------------- syslogd as a client (UDP transfer) no MACRO : 5000 messages (100%) #no loss at all file MACRO : ~4979 messages (99%) #loss: packet receive errors file+template MACRO: ~4408 messages (88%) #loss: packet receive errors
syslog-ng as a client (UDP transfer) no MACRO : ~3905 messages (78%) \ file MACRO : ~3400 messages (68%) > #loss: packet receive errors!! file+template MACRO: ~3362 messages (67%) /
syslog-ng as a client (TCP transfer) no MACRO : 5000 messages (100%) file MACRO : 5000 messages (100%) file+template MACRO: 5000 messages (100%)
What strikes me as particularly odd is that syslog-ng as a client in UDP mode still performs worse than syslogd in burst mode.
What do you use for fetching local messages? unix-stream or unix-dgram? sysklogd uses unix-dgram while syslog-ng is usually used with unix-stream sources.
In case you're interested: The 5000 messages are 142Bytes in length and sent in 0.9s without any sleeping between sending. This results in almost exactly 1MByte/s which is close to link saturation (no IRQ or RX queue starvation noticed from NIC driver code) on my 10Mbit/s test network.
Your results are really appreciated. No one has done so extensive performance testing on syslog-ng before.
If I find time I'll run the tests with 3 clients too. But meanwhile I'm all for the inclusion of the gprof'd code since rewriting doesn't look like a feasible option in the current state of development (shortly before stable release).
I'll go with the gperf based code for now. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
What do you use for fetching local messages? unix-stream or unix-dgram?
unix-stream
sysklogd uses unix-dgram while syslog-ng is usually used with unix-stream sources.
Ahhhh, bingo, that's the culprit! Thanks for the pointer. I should then rerun the tests with unix-dgram to get a more accurate comparison and to eventually evade the glibc bug with unix-stream. Let's see if I can shovel enough spare time next week to run a fresh test conduct.
Your results are really appreciated. No one has done so extensive performance testing on syslog-ng before.
Thanks. I'll see if I can do more indepth testing in direction of using pipes, tunnels and redirection (actually sending back filtered event messages from a pipe of a jailed syslog-ng to the originating process [don't ask :)]) to find out if there are other problems.
If I find time I'll run the tests with 3 clients too. But meanwhile I'm all for the inclusion of the gprof'd code since rewriting doesn't look like a feasible option in the current state of development (shortly before stable release).
I'll go with the gperf based code for now.
Cool. Cheers, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
It would also be nice to see comparison data with the standard Syslog that comes with Linux/Solaris. These are good benchmarks to make available on the web page. Richard
-----Original Message----- From: syslog-ng-admin@lists.balabit.hu [mailto:syslog-ng-admin@lists.balabit.hu] On Behalf Of Roberto Nibali Sent: Friday, March 07, 2003 7:55 AM To: syslog-ng@lists.balabit.hu Subject: Re: [syslog-ng]performance test questions
What do you use for fetching local messages? unix-stream or unix-dgram?
unix-stream
sysklogd uses unix-dgram while syslog-ng is usually used with unix-stream sources.
Ahhhh, bingo, that's the culprit! Thanks for the pointer. I should then rerun the tests with unix-dgram to get a more accurate comparison and to eventually evade the glibc bug with unix-stream.
Let's see if I can shovel enough spare time next week to run a fresh test conduct.
Your results are really appreciated. No one has done so extensive performance testing on syslog-ng before.
Thanks. I'll see if I can do more indepth testing in direction of using pipes, tunnels and redirection (actually sending back filtered event messages from a pipe of a jailed syslog-ng to the originating process [don't ask :)]) to find out if there are other problems.
If I find time I'll run the tests with 3 clients too. But meanwhile I'm all for the inclusion of the gprof'd code since rewriting doesn't look like a feasible option in the current state of development (shortly before stable release).
I'll go with the gperf based code for now.
Cool.
Cheers, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439s nlbxq' | dc
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/sysl> og-ng
Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
Hello,
It would also be nice to see comparison data with the standard Syslog that comes with Linux/Solaris. These are good benchmarks to make available on the web page.
I do agree but what's the standard syslog for Linux? Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
Does not matter much. Pick the Linux you are using for the test as the comparison. If you package up your tests some of us could duplicate the tests on other variants. Richard
-----Original Message----- From: syslog-ng-admin@lists.balabit.hu [mailto:syslog-ng-admin@lists.balabit.hu] On Behalf Of Roberto Nibali Sent: Monday, March 10, 2003 12:49 AM To: syslog-ng@lists.balabit.hu Subject: Re: [syslog-ng]performance test questions
Hello,
It would also be nice to see comparison data with the standard Syslog that comes with Linux/Solaris. These are good benchmarks to make available on the web page.
I do agree but what's the standard syslog for Linux?
Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439s nlbxq' | dc
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/sysl> og-ng
Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
On Thu, Jan 30, 2003 at 05:12:56PM +0100, Roberto Nibali wrote:
I'm conducting syslog-ng performance tests. I've deployed a central loghost environment and three clients trying to send messages which should end up on the central loghost with as few losses as possible. The whole thing is currently done over UDP and the clients have either syslogd or syslog-ng as daemons.
Balazs has found some problems he's looking into, but there's something left for you to do. You need to establish a baseline for the maximum throughput your OS and network can send over UDP. If you setup a listener on a UDP port that simply discarded all input (instead of syslog-ng) and send UDP packets with little or no payload, you may still get all kinds of packet loss. The reason for this is because it's possible to overflow your UDP send buffer before anything is even written to the network. Your message loss may primarily be on the sending system, and have little to do with syslog-ng. <URL:http://cert.uni-stuttgart.de/archive/loganalysis/2002/01/msg00049.html> -- Nate Campi http://www.campin.net "Computers are useless. They can only give you answers." - Picasso, Pablo (1881-1973)
Balazs has found some problems he's looking into, but there's something left for you to do. You need to establish a baseline for the maximum throughput your OS and network can send over UDP. If you setup a
Well, I haven't explained my whole test setup in my first email. I do tests like that too, although there is little point in doing such tests with the current result. As long as the percentage of packet loss seen for the daemons differ in such a significant way it is an exercise in futility and thus a waste of time. Once those initial problems get sorted out, I will conduct raw UDP throughput tests to compare numbers. Another thing is that if you add the metric of UDP 'client - server' throughput performance you add another dozen or so knobs representing parameters. To be able to control your test conducts and generate appropriate statements based on observation less parameters yield more accuracy.
listener on a UDP port that simply discarded all input (instead of syslog-ng) and send UDP packets with little or no payload, you may still get all kinds of packet loss.
I do count packet loss with a kernel patch I've done to the iproute2 framework, plus I do vary buffer sizes of TX/RX queues, plus I've a fallback counter done with ipchains/iptables/pf rules and I permanently have an xterm with a tcpdump running. The problem is that tcpdump doesn't get all packets, so dropped packets differ between what tcpdump and iproute2 are seeing.
The reason for this is because it's possible to overflow your UDP send buffer before anything is even written to the network. Your message loss may primarily be on the sending system, and have little to do with syslog-ng.
Absolutely correct. With Linux however, you are provided with very useful network related statistics under /proc/net/*. Using the latest iproute2 and iputils package from Alexey you even get human readable output in realtime :).
<URL:http://cert.uni-stuttgart.de/archive/loganalysis/2002/01/msg00049.html>
Very interesting reading, thanks. Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
participants (6)
-
Achim Gsell
-
Balazs Scheidler
-
Dmitry Frolov
-
Nate Campi
-
Richard E. Perlotto II
-
Roberto Nibali