[tproxy] anything known about potential memory leaks in tproxy 1.1.3 ?
Lennert Buytenhek
buytenh@wantstofly.org
Tue, 6 Apr 2004 12:20:25 +0200
On Tue, Apr 06, 2004 at 11:50:09AM +0200, KOVACS Krisztian wrote:
> Hi,
Hi,
> > Our production firewall/transproxy suddenly became _really_ unresponsive
> > a few days ago after ~120 days of uptime, from what seems to have been a
> > memory leak.
> >
> > The machine has 1GB of memory, but the amount of page cache was really small,
> > the number of processes waiting for disk I/O was about 30, and at avg 4MB/sec
> > reading from disk it was seeking all over the place. Also, bash would
> > sometimes say something along the lines of 'resource unavailable' when
> > starting subprocesses.
> >
> > Temporarily disabling the transproxy function by deleting the redirect rule
> > caused the disk to go idle (our transproxy-using app intercepts HTTP requests
> > and answers the requests locally for static data, fileset is about 5GB but
> > ~200MB of it accounts for 90% of the hits.) But every little thing that I
> > tried, even an ls or a grep in /etc caused massive disk I/O.
> >
> > I hastily rebooted the machine and thus have no ways of debugging this
> > anymore. But I was wondering, is anything known about memory leaks in the
> > 1.1.3 version of the tproxy code? I checked the changelogs, but couldn't
> > find anything related to memleaks.
> >
> > Should I schedule maintenance to move the machine to 1.9.1? Any other
> > advice? (It's not a problem for me to schedule the machine to be rebooted
> > every 90 days or something like that.)
>
> We've also found signs that suggest that there may be a possible
> memory leak in tproxy (it seemed to leak conntrack entries). However, I
> don't have an idea what the problem might be, so it's not fixed yet.
>
> Could you provide a bit more information? Is this system SMP or UP?
> I'd be happy to receive the contents of /proc/slabinfo, if possible,
> daily/hourly snapshots.
Uniprocessor system, P4 2.4GHz. Two e1000 fiber cards, 2.4.20-20.9 Red
Hat kernel plus bridging firewall patch and tproxy patch 1.1.3. Right
now about 200 SYNs per second, which is sort-of a quiet time of day.
/proc/net/ip_conntrack right now shows about 35k entries, 'netstat -tn'
shows about 7k.
I've installed a cron job to take regular snapshots of /proc/slabinfo,
will get back to you soon.
> Moving to 1.9.1 is not recommended in your case, its ABI/API is
> incompatible (though trivial to fix in your sources), and probably won't
> help.
OK. Thanks so far.
cheers,
Lennert