[tproxy] tproxy race condition? [RESEND]
Fri, 17 Dec 2004 14:19:41 +0000
Many thanks for your reply.
<reply snipped in places>
> Unfortunately handling errors is the most problematic part of tproxy.
> The difficulty lies in the fact that when the setsockopt() calls return,
> we have no way of knowing if the not-yet-established connection will
> clash with another connection in the conntrack hash or not. This is
> because the connection won't be created until the first packet leaves
> the machine, which is shortly after you call connect(). If the tproxy
> Netfilter hook detects that it cannot apply a NAT mapping, it just drops
> the packet (and probably the conntrack entry as well) since it has no
> way of notifying the user-space process.
Agreed. It's not possible to pre-add the mapping to the conntrack table at the setsockopt() stage, I take it.
I'll be keen to move to using NAT reservations as evidently it will help me in the long run -- it's just that as this bug shows up with and without them, at this stage I'm not using them, for simplicity.
> If you start two client processes, you'll have a good chance of trying
> to assign "colliding" foreign addresses. If you set REUSEADDR, tproxy
> will allow you to assign the same foreign address more than once, since
> you've explicitly requested to do so by setting REUSEADDR (let's assume
> you've chosen port x). However, as soon as you try to use them, you'll
> experience problems, since the reply tuples of the connections would be
> the same. Of course connection tracking won't allow this, so trying to
> apply the NAT mapping will fail for one of the client processes. (I
> don't know yet why the packets leave the machine with an unmodified
> source IP, in theory they should be dropped, or at least NAT-ted to the
> wrong source port number...)
Fair enough. If I adjust the program such that one process asks tproxy to assign odd numbered foreign ports, and the other process even numbered foreign ports, the problem still happens just as quickly -- so it's not a simple collision fault!
As an aside, the Linux TCP/IP stack allows a single IP address to make >65,536 TCP connections at once. It does this by allowing >1 sockets to share the same local port [in the auto-bind code called by TCP connect()], as long as they're connecting to different remote end-points. The return packets are demultiplexed by remote end-point as well as the local one. Additionally, some OSes even allow the user to pre-bind sockets to a local port of _their choice_ before making a connect(), easily allowing >1 connections at once per local port!
> > What I believe is happening is as follows: There is evidence in dmesg that
> > the first SYN packet of the connect() passes through the LOCAL_OUT iptables
> > hooks (I see "ip_tproxy_fn(): new connection, hook=3" and "ip_tproxy_fn():
> > new connection, hook=4", but for some reason the packet never actually
> > makes it onto the wire.
> Don't you have any kind of errors in the kernel logs when this
> happens? Tproxy could drop the packet, but you should get an error
> message in that case.
No errors at all :o(. The curious thing is that I added extra printk's to all the cases in the tproxy code where I could see "return NF_DROP" (or equivalent), and none of these printed -- so I presume the packet drop is elsewhere (I don't know where).
> _This_ is strange... Could you send me a tcpdump capture of that
> traffic and the matching tproxy debug output?
Will do, in a separate post.
> I have a few recommendations:
> * Try to avoid explicitly specifying the foreign (fake) port
> number at all costs. If you assign a foreign port of zero,
> connection tracking will select a free port number when applying
> the NAT mapping. This way you won't have such weird problems.
I agree, I'd love to, but my app isn't able to choose the fake ports it uses -- my only option is detecting errors and dropping the connection if necessary.
> * Each and every connection _must_ have unique endpoints. When you
> run two instances of your client, you'll run into a theoretical
> problem as well: sometimes you try to establish two TCP
> connections with exactly the same endpoints. This is clearly
> invalid, and wouldn't be possible without using tproxy, of
Yes, you're right. It is possible to run into this case with the test programs I sent if you wait long enough, but I'm not too worried about this just now as it doesn't appear to result in any more non-NATted traffic.
> > One other curious thing here: MUST_BE_READ_LOCKED(&ip_tproxy_lock) in
> > ip_tproxy_relatedct_add() fails. Could this be related in any way?
> Not really, that call is completely bogus IMHO. We probably don't need
> that check there, I'll remove it.
Food for thought :o). I'll get back to you with some tcpdumps, etc.