[tproxy] tproxy race condition? [RESEND]

KOVACS Krisztian hidden@balabit.hu
Fri, 17 Dec 2004 15:58:25 +0100


  Hi Jim,

2004-12-17, p keltezéssel 15:19-kor jim@minter.demon.co.uk ezt írta:
> Fair enough.  If I adjust the program such that one process asks
> tproxy to assign odd numbered foreign ports, and the other process
> even numbered foreign ports, the problem still happens just as quickly
> -- so it's not a simple collision fault!
> 
> As an aside, the Linux TCP/IP stack allows a single IP address to make
> >65,536 TCP connections at once.  It does this by allowing >1 sockets
> to share the same local port [in the auto-bind code called by TCP
> connect()], as long as they're connecting to different remote
> end-points.  The return packets are demultiplexed by remote end-point
> as well as the local one.  Additionally, some OSes even allow the user
> to pre-bind sockets to a local port of _their choice_ before making a
> connect(), easily allowing >1 connections at once per local port!

  Of course, this is clear. This is what REUSEADDR was invented for, and
tproxy allows you to assign the same foreign address to multiple sockets
as well (of course with some restricitions). Unfortunately in the IP
stack of the kernel these things are much more simple: if you set
REUSEADDR, you're allowed to bind() to an address already taken.
However, you'll get an error when trying to connect() to the same
destination host. In case of tproxy this is much more difficult, since
you won't be able to detect clashes before it's too late. (Without NAT
reservations.)

> > > What I believe is happening is as follows: There is evidence in dmesg that
> > > the first SYN packet of the connect() passes through the LOCAL_OUT iptables
> > > hooks (I see "ip_tproxy_fn(): new connection, hook=3" and "ip_tproxy_fn():
> > > new connection, hook=4", but for some reason the packet never actually
> > > makes it onto the wire.
> > 
> >   Don't you have any kind of errors in the kernel logs when this
> > happens? Tproxy could drop the packet, but you should get an error
> > message in that case.
> 
> No errors at all :o(.  The curious thing is that I added extra
> printk's to all the cases in the tproxy code where I could see "return
> NF_DROP" (or equivalent), and none of these printed -- so I presume
> the packet drop is elsewhere (I don't know where).

  Ok, do you have any DNAT/MASQUERADE rules in your iptables config? Or
what kind of NAT rulese do you use?

  Another shortcoming of the NAT-based operation of tproxy is the
following: you have to make sure that you do not reuse the _local_
address before the conntrack entry of the previous connection from that
address times out. So, if you make a lot of connections from the same
IP, and the local autobind port range is not enough for you, you'll have
to use additional local IP addresses as well. (Note that these do not
need to be routable IP addresses.)

  For example if you make 400 short-lived connections per second, and
have configured the local port range to contain 50000 ports, it will
take 125 seconds for the port range to turn over. The timeout of
conntrack entries in TIME_WAIT state is 120 seconds, so with 400 cps
you're already likely to have problems.

> >   _This_ is strange... Could you send me a tcpdump capture of that
> > traffic and the matching tproxy debug output?
> 
> Will do, in a separate post.
> 
> >   I have a few recommendations:
> > 
> >       * Try to avoid explicitly specifying the foreign (fake) port
> >         number at all costs. If you assign a foreign port of zero,
> >         connection tracking will select a free port number when applying
> >         the NAT mapping. This way you won't have such weird problems.
> 
> I agree, I'd love to, but my app isn't able to choose the fake ports
> it uses -- my only option is detecting errors and dropping the
> connection if necessary.

  You're right, unfortunately there are cases when this is not an
option.

-- 
 Regards,
   Krisztian KOVACS