Dropping UDP logs - need ideas
Syslog-ng'ers, I have a Solaris 8 Syslog-ng logger in production which runs and tests out fine. Now, I've built another Sparc/Solaris 8 system on an Ultra 10 and I am dropping UDP logs at about a 5-15% rate. I am testing just as before with Kiwi's Syslog generator. When I send a burst of 100 logs, I see the loss of logs. I had this problem with the first server, but I simply maxed out the udp_recv_hiwat to 65536, and all was well, even with 500 log bursts. This new server has been configured the same way. The only difference is the hardware. I even tried the previous version of Syslog-ng used - no help. Here's what I have: Latest Syslog-ng, compiled with sun-door, and tcp-wrappers 400Mhz UltraSparcII CPU 500M memory 2 x 72G SCSI drives (10k rpm) 10/100M ether interface running at 100fdx Here's the sad part: I tested with the old syslogd, and it logged only 73% of the UDP logs. (Had one entry, plus a "repeated 72 times" line) So it looks like a system problem, just not sure where it is. I tried upping the sync(), and it didn't seem to help either. I have the /var partition on it's own 10K SCSI drive, so I doubt it's a disk I/O issue - iostat looks ok too. I tries turning the use_dns() on/off too. It doesn't even log with it off, I guess because I have the dns_cache on which must conflict.(?) I have plenty of memory and swap. Actually I've ge the system JASS'ed out, so I'm not running much of anything process-wise, except Syslog-ng. Any debugging suggestions? Oh, yeah. I've upped the log_fifo_size, and - no help. Using TCP is not option now, so that won't be a solution for me. Help! While I wait for replies, I'll think I'll count bytes seen on the interface by netstat... or snoop... Thanks gang. Wayne Sweatt Sr. UNIX System Administrator Comforce Technical Services LANL SCC Team
On Mon, Sep 22, 2003 at 10:37:13AM -0600, Wayne Sweatt wrote:
Here's the sad part: I tested with the old syslogd, and it logged only 73% of the UDP logs. (Had one entry, plus a "repeated 72 times" line) So it looks like a system problem, just not sure where it is. I tried upping the sync(), and it didn't seem to help either. I have the /var partition on it's own 10K SCSI drive, so I doubt it's a disk I/O issue - iostat looks ok too. I tries turning the use_dns() on/off too. It doesn't even log with it off, I guess because I have the dns_cache on which must conflict.(?) I have plenty of memory and swap. Actually I've ge the system JASS'ed out, so I'm not running much of anything process-wise, except Syslog-ng. Any debugging suggestions?
probably syslog-ng (e.g. your CPU) is not fast enough to fetch messages from the receive buffer. Increasing the UDP receive buffer could help you to a point, but if the incoming rate is higher than the rate syslog-ng is processing traffic there's nothing you could do. Does your CPU have idle time in vmstat? Another possibility is that syslog-ng is waiting for something and this slows down processing. I don't know if truss is able to print timestamps to system calls, strace on Linux can do this and might help you to debug what makes syslog-ng wait. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
participants (2)
-
Balazs Scheidler
-
Wayne Sweatt