[tproxy] anything known about potential memory leaks in tproxy 1.1.3 ?

Lennert Buytenhek buytenh@wantstofly.org
Tue, 6 Apr 2004 12:20:25 +0200


On Tue, Apr 06, 2004 at 11:50:09AM +0200, KOVACS Krisztian wrote:

>   Hi,

Hi,


> > Our production firewall/transproxy suddenly became _really_ unresponsive
> > a few days ago after ~120 days of uptime, from what seems to have been a
> > memory leak.
> > 
> > The machine has 1GB of memory, but the amount of page cache was really small,
> > the number of processes waiting for disk I/O was about 30, and at avg 4MB/sec
> > reading from disk it was seeking all over the place.  Also, bash would
> > sometimes say something along the lines of 'resource unavailable' when
> > starting subprocesses.
> > 
> > Temporarily disabling the transproxy function by deleting the redirect rule
> > caused the disk to go idle (our transproxy-using app intercepts HTTP requests
> > and answers the requests locally for static data, fileset is about 5GB but
> > ~200MB of it accounts for 90% of the hits.)  But every little thing that I
> > tried, even an ls or a grep in /etc caused massive disk I/O.
> > 
> > I hastily rebooted the machine and thus have no ways of debugging this
> > anymore.  But I was wondering, is anything known about memory leaks in the
> > 1.1.3 version of the tproxy code?  I checked the changelogs, but couldn't
> > find anything related to memleaks.
> > 
> > Should I schedule maintenance to move the machine to 1.9.1?  Any other
> > advice?  (It's not a problem for me to schedule the machine to be rebooted
> > every 90 days or something like that.)
> 
>   We've also found signs that suggest that there may be a possible
> memory leak in tproxy (it seemed to leak conntrack entries). However, I
> don't have an idea what the problem might be, so it's not fixed yet.
> 
>   Could you provide a bit more information? Is this system SMP or UP?
> I'd be happy to receive the contents of /proc/slabinfo, if possible,
> daily/hourly snapshots.

Uniprocessor system, P4 2.4GHz.  Two e1000 fiber cards, 2.4.20-20.9 Red
Hat kernel plus bridging firewall patch and tproxy patch 1.1.3.  Right
now about 200 SYNs per second, which is sort-of a quiet time of day.
/proc/net/ip_conntrack right now shows about 35k entries, 'netstat -tn'
shows about 7k.

I've installed a cron job to take regular snapshots of /proc/slabinfo,
will get back to you soon.


>   Moving to 1.9.1 is not recommended in your case, its ABI/API is
> incompatible (though trivial to fix in your sources), and probably won't
> help.

OK.  Thanks so far.


cheers,
Lennert