On Tue, Apr 06, 2004 at 11:50:09AM +0200, KOVACS Krisztian wrote:
Hi,
Hi,
Our production firewall/transproxy suddenly became _really_ unresponsive a few days ago after ~120 days of uptime, from what seems to have been a memory leak.
The machine has 1GB of memory, but the amount of page cache was really small, the number of processes waiting for disk I/O was about 30, and at avg 4MB/sec reading from disk it was seeking all over the place. Also, bash would sometimes say something along the lines of 'resource unavailable' when starting subprocesses.
Temporarily disabling the transproxy function by deleting the redirect rule caused the disk to go idle (our transproxy-using app intercepts HTTP requests and answers the requests locally for static data, fileset is about 5GB but ~200MB of it accounts for 90% of the hits.) But every little thing that I tried, even an ls or a grep in /etc caused massive disk I/O.
I hastily rebooted the machine and thus have no ways of debugging this anymore. But I was wondering, is anything known about memory leaks in the 1.1.3 version of the tproxy code? I checked the changelogs, but couldn't find anything related to memleaks.
Should I schedule maintenance to move the machine to 1.9.1? Any other advice? (It's not a problem for me to schedule the machine to be rebooted every 90 days or something like that.)
We've also found signs that suggest that there may be a possible memory leak in tproxy (it seemed to leak conntrack entries). However, I don't have an idea what the problem might be, so it's not fixed yet.
Could you provide a bit more information? Is this system SMP or UP? I'd be happy to receive the contents of /proc/slabinfo, if possible, daily/hourly snapshots.
Uniprocessor system, P4 2.4GHz. Two e1000 fiber cards, 2.4.20-20.9 Red Hat kernel plus bridging firewall patch and tproxy patch 1.1.3. Right now about 200 SYNs per second, which is sort-of a quiet time of day. /proc/net/ip_conntrack right now shows about 35k entries, 'netstat -tn' shows about 7k. I've installed a cron job to take regular snapshots of /proc/slabinfo, will get back to you soon.
Moving to 1.9.1 is not recommended in your case, its ABI/API is incompatible (though trivial to fix in your sources), and probably won't help.
OK. Thanks so far. cheers, Lennert