Solaris syslog-ng tuning
Vincent! Thanks much for confirming the issue and repeating the link to me. Well, the only _intended_ udp traffic to the system is syslog. Currently, the system is logging from a PIX on one GigE interface, and from a few servers plus a less active PIX on another GigE. We send the PIX logs, separately, to pipes. And log everything to file. # vmstat 5 4 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s3 -- in sy cs us sy id 0 0 4 2793888 737144 34 5 232 2 1 0 0 0 3 0 0 124 18 107 11 9 80 0 0 36 2680456 673440 5 6 0 0 0 0 0 3 5 0 0 1897 15728 2516 9 15 76 0 0 36 2680456 673440 5 5 0 0 0 0 0 0 2 0 0 1612 13349 2268 10 10 80 0 0 36 2680456 673440 5 5 0 0 0 0 0 0 3 0 0 1854 15740 2520 13 15 73 # iostat 5 4 tty sd0 sd1 sd30 nfs1 cpu tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 0 37 5 0 11 421 3 22 0 0 0 0 0 0 11 9 0 80 0 47 0 0 0 259 2 29 0 0 0 0 0 0 11 14 0 75 0 16 28 4 17 334 5 25 0 0 0 0 0 0 15 16 1 69 0 16 2 0 10 293 2 40 0 0 0 0 0 0 11 11 0 78 At: ndd /dev/udp udp_max_buf 33554432 (32Mb!) We have these time/counter readings for udpInOverflows: 00 - 645628929 33 - 645630391 96 - 645632008 Or, about 1924 packets/minute lost. At udp_max_buf 64Mb (!!!), 2713 packets/minute lost. I am FAR from out of memory 700Mb free. 1) Am I reading that loss right?? 2) Any tips from Solaris/syslog-ng tuners would be appreciated! Kim On Mar 6, 2006, at 8:49 AM, syslog-ng-request@lists.balabit.hu wrote:
Le Mon Mar 6 07:45:39 2006, Cary, Kim a ecrit: | Syslog-ng 1.6.4 on Solaris 9: | | IPv4 | udpInOverflows =640473547 | | UDP | udpInDatagrams =409687632 udpInErrors = 0 | udpOutDatagrams =466811 udpOutErrors = 0 | | Does the udpInOverflows indicate I'm losing packets?
Yes, as mentioned in this link http://www.29west.com/docs/THPM/udp-buffer-sizing.html given today by Mike, it means that some udp packets could not be inserted in the sockets buffers.
Be careful, it means you are losing udp packets, not only syslog packets...
Vincent.
Le Tue Mar 7 09:34:23 2006, Cary, Kim a écrit: | Vincent! | | Thanks much for confirming the issue and repeating the link to me. | | Well, the only _intended_ udp traffic to the system is syslog. | Currently, the system is logging from a PIX on one GigE interface, | and from a few servers plus a less active PIX on another GigE. | | We send the PIX logs, separately, to pipes. And log everything to file. | | # vmstat 5 4 | kthr memory page disk faults cpu | r b w swap free re mf pi po fr de sr s0 s1 s3 -- in sy cs us sy id | 0 0 4 2793888 737144 34 5 232 2 1 0 0 0 3 0 0 124 18 107 11 9 80 | 0 0 36 2680456 673440 5 6 0 0 0 0 0 3 5 0 0 1897 15728 2516 9 15 76 | 0 0 36 2680456 673440 5 5 0 0 0 0 0 0 2 0 0 1612 13349 2268 10 10 80 | 0 0 36 2680456 673440 5 5 0 0 0 0 0 0 3 0 0 1854 15740 2520 13 15 73 | | # iostat 5 4 | tty sd0 sd1 sd30 nfs1 cpu | tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id | 0 37 5 0 11 421 3 22 0 0 0 0 0 0 11 9 0 80 | 0 47 0 0 0 259 2 29 0 0 0 0 0 0 11 14 0 75 | 0 16 28 4 17 334 5 25 0 0 0 0 0 0 15 16 1 69 | 0 16 2 0 10 293 2 40 0 0 0 0 0 0 11 11 0 78 | | At: ndd /dev/udp udp_max_buf 33554432 (32Mb!) | | We have these time/counter readings for udpInOverflows: | 00 - 645628929 | 33 - 645630391 | 96 - 645632008 | | Or, about 1924 packets/minute lost. | | At udp_max_buf 64Mb (!!!), 2713 packets/minute lost. | | I am FAR from out of memory 700Mb free. | | 1) Am I reading that loss right?? Probably, you might however want to snoop on the interface to see what kind of udp packets come on your interface. | 2) Any tips from Solaris/syslog-ng tuners would be appreciated! udp_max_buf does not set the queue length of the udp socket, which by the way can have a different value for each socket... You could have a look at: http://sunsolve.sun.com/search/document.do?assetkey=1-30-3218-1 basically: increasing udp_max_buf without increasing udp_recv_hiwat has no meaning. Furthermore, you can increase you socket buffer that way up to 64k (Solaris 8 & 9), if you want to increase it further up you must use the setsockopt call (up to udp_max_buf which has a maximum value of 1GB). Here is the official SUN documentation regarding this: http://docs.sun.com/app/docs/doc/817-0404/6mg74vsb5?a=view#gbtag Now regarding your packet loss issue. I would increase udp_recv_hiwat -> 65536 udp_max_buf -> 1073741824 (you will never get here anyway) Then I would try to play with syslog-ng config: log_fifo_size, log_iw_size and log_fetch_limit. But here I'd appreciate a syslog-ng expert to step in and tell us what to do more preceisely. Vincent -- .~. Vincent Haverlant -- Galadril -- #ICQ: 35695155 /V\ MSN: vincent_msn@haverlant.org -- http://www.haverlant.org/ /( )\ Parinux member: http://www.parinux.org/ ^^-^^ GPG: 8FEA 52C2 5C54 A201 2375 0FA5 AF2E 1881 92D0 EE84
On Tue, 2006-03-07 at 22:45 +0100, Vincent Haverlant wrote:
Le Tue Mar 7 09:34:23 2006, Cary, Kim a écrit:
basically: increasing udp_max_buf without increasing udp_recv_hiwat has no meaning. Furthermore, you can increase you socket buffer that way up to 64k (Solaris 8 & 9), if you want to increase it further up you must use the setsockopt call (up to udp_max_buf which has a maximum value of 1GB). Here is the official SUN documentation regarding this: http://docs.sun.com/app/docs/doc/817-0404/6mg74vsb5?a=view#gbtag
I've increased the priority of the SO_RCVBUF option in my mind and probably add a patch soon.
Now regarding your packet loss issue. I would increase udp_recv_hiwat -> 65536 udp_max_buf -> 1073741824 (you will never get here anyway)
Then I would try to play with syslog-ng config: log_fifo_size, log_iw_size and log_fetch_limit. But here I'd appreciate a syslog-ng expert to step in and tell us what to do more preceisely.
It also depends on which syslog-ng version you are using. 1.9.x is a complete replacement and has a different queueing model than 1.6.x so tuning is also different. You can read a short chapter in the documentation for 1.6.x http://www.balabit.com/products/syslog-ng/reference-1.6/syslog-ng.html/index... This is however not completely updated for 1.9.x yet. The followings are the most important differences: * there is no garbage collector, their parameters are ignored * syslog-ng supports message flow-control, e.g. messages will not be read if they cannot be sent (this requires flow controlled channels like TCP on both receiving/sending sides); this makes the application slow down instead of losing messages If you are not using flow control, the recipe is the same: * make sure your input socket buffer _AND_ your output queue size (log_fifo_size) is able to hold your potential message bursts -- Bazsi
Le Mon Mar 13 18:14:24 2006, Balazs Scheidler a écrit: | On Tue, 2006-03-07 at 22:45 +0100, Vincent Haverlant wrote: | > Le Tue Mar 7 09:34:23 2006, Cary, Kim a écrit: | | > basically: increasing udp_max_buf without increasing udp_recv_hiwat has | > no meaning. Furthermore, you can increase you socket buffer that way up | > to 64k (Solaris 8 & 9), if you want to increase it | > further up you must use the setsockopt call (up to udp_max_buf which | > has a maximum value of 1GB). | > Here is the official SUN documentation regarding this: | > http://docs.sun.com/app/docs/doc/817-0404/6mg74vsb5?a=view#gbtag | | I've increased the priority of the SO_RCVBUF option in my mind and | probably add a patch soon. Oh thank you very much. :) | > Now regarding your packet loss issue. I would increase | > udp_recv_hiwat -> 65536 | > udp_max_buf -> 1073741824 (you will never get here anyway) | > | > Then I would try to play with syslog-ng config: log_fifo_size, | > log_iw_size and log_fetch_limit. But here I'd appreciate | > a syslog-ng expert to step in and tell us what to do more preceisely. | | It also depends on which syslog-ng version you are using. 1.9.x is a | complete replacement and has a different queueing model than 1.6.x so | tuning is also different. | | You can read a short chapter in the documentation for 1.6.x | http://www.balabit.com/products/syslog-ng/reference-1.6/syslog-ng.html/index... | | This is however not completely updated for 1.9.x yet. The followings are | the most important differences: | * there is no garbage collector, their parameters are ignored | * syslog-ng supports message flow-control, e.g. messages will not be | read if they cannot be sent (this requires flow controlled channels like | TCP on both receiving/sending sides); this makes the application slow | down instead of losing messages | | If you are not using flow control, the recipe is the same: | * make sure your input socket buffer _AND_ your output queue size | (log_fifo_size) is able to hold your potential message bursts I'm using 1.9.9 after observing the same issue on 1.6.x. and I have indeed played with log_iw_size, log_fetch_limit and log_fifo_size and set them up to be able to handle 100 messages for each host at the same time. I also did read the code relating to the flow_control mechanism but never got any "dropped" messages in syslog-ng statistics. And however I increased the internal buffers I still lost information until I increased the socket buffer as described in my previous email. But still, thanks for this clarification. Vincent. -- .~. Vincent Haverlant -- Galadril -- #ICQ: 35695155 /V\ MSN: vincent_msn@haverlant.org -- http://www.haverlant.org/ /( )\ Parinux member: http://www.parinux.org/ ^^-^^ GPG: 8FEA 52C2 5C54 A201 2375 0FA5 AF2E 1881 92D0 EE84
I am running syslog-ng 1.6.8 and when my logging space fills up (that's a different story) syslog-ng closes the log file with messages like io.c: do_write: write() failed (errno 28), No space left on device pkt_buffer::do_flush(): Error flushing data and starts queueing messages according to the log_fifo_size option. My concern is that when disk space became available again, the logging was not resumed, and I lost all of the messages for the closed log files until the macro expansion changed the file name. Shouldn't the "time_reopen" apply to files as well? I have it set to 5 seconds. Is there another option that would allow me to handle this situation? How would this be handled in the 1.9 (2.0) code path? Evan Rempel
Did anyone want to comment on this post? I expected to hear back from someone. On Wed, 15 Mar 2006, Evan Rempel wrote:
Date: Wed, 15 Mar 2006 22:21:31 -0800 (PST) From: Evan Rempel <erempel@uvic.ca> Reply-To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: [syslog-ng] syslog-ng 1.6.8 does not reopen logs when out of disk space
I am running syslog-ng 1.6.8 and when my logging space fills up (that's a different story) syslog-ng closes the log file with messages like
io.c: do_write: write() failed (errno 28), No space left on device pkt_buffer::do_flush(): Error flushing data
and starts queueing messages according to the log_fifo_size option. My concern is that when disk space became available again, the logging was not resumed, and I lost all of the messages for the closed log files until the macro expansion changed the file name.
Shouldn't the "time_reopen" apply to files as well? I have it set to 5 seconds.
Is there another option that would allow me to handle this situation?
How would this be handled in the 1.9 (2.0) code path?
Evan Rempel
participants (4)
-
Balazs Scheidler
-
Cary, Kim
-
Evan Rempel
-
Vincent Haverlant