[tproxy] tproxy race condition? [RESEND]

Fri, 17 Dec 2004 14:19:41 +0000

Hi Krisztian!

Many thanks for your reply.

hidden@balabit.hu wrote:

<reply snipped in places> 

>   Unfortunately handling errors is the most problematic part of tproxy.
> The difficulty lies in the fact that when the setsockopt() calls return,
> we have no way of knowing if the not-yet-established connection will
> clash with another connection in the conntrack hash or not. This is
> because the connection won't be created until the first packet leaves
> the machine, which is shortly after you call connect(). If the tproxy
> Netfilter hook detects that it cannot apply a NAT mapping, it just drops
> the packet (and probably the conntrack entry as well) since it has no
> way of notifying the user-space process.

Agreed.  It's not possible to pre-add the mapping to the conntrack table at the setsockopt() stage, I take it.

I'll be keen to move to using NAT reservations as evidently it will help me in the long run -- it's just that as this bug shows up with and without them, at this stage I'm not using them, for simplicity.

>   If you start two client processes, you'll have a good chance of trying
> to assign "colliding" foreign addresses. If you set REUSEADDR, tproxy
> will allow you to assign the same foreign address more than once, since
> you've explicitly requested to do so by setting REUSEADDR (let's assume
> you've chosen port x). However, as soon as you try to use them, you'll
> experience problems, since the reply tuples of the connections would be
> the same. Of course connection tracking won't allow this, so trying to
> apply the NAT mapping will fail for one of the client processes. (I
> don't know yet why the packets leave the machine with an unmodified
> source IP, in theory they should be dropped, or at least NAT-ted to the
> wrong source port number...)

Fair enough.  If I adjust the program such that one process asks tproxy to assign odd numbered foreign ports, and the other process even numbered foreign ports, the problem still happens just as quickly -- so it's not a simple collision fault!

As an aside, the Linux TCP/IP stack allows a single IP address to make >65,536 TCP connections at once.  It does this by allowing >1 sockets to share the same local port [in the auto-bind code called by TCP connect()], as long as they're connecting to different remote end-points.  The return packets are demultiplexed by remote end-point as well as the local one.  Additionally, some OSes even allow the user to pre-bind sockets to a local port of _their choice_ before making a connect(), easily allowing >1 connections at once per local port!

> > What I believe is happening is as follows: There is evidence in dmesg that
> > the first SYN packet of the connect() passes through the LOCAL_OUT iptables
> > hooks (I see "ip_tproxy_fn(): new connection, hook=3" and "ip_tproxy_fn():
> > new connection, hook=4", but for some reason the packet never actually
> > makes it onto the wire.
> 
>   Don't you have any kind of errors in the kernel logs when this
> happens? Tproxy could drop the packet, but you should get an error
> message in that case.

No errors at all :o(.  The curious thing is that I added extra printk's to all the cases in the tproxy code where I could see "return NF_DROP" (or equivalent), and none of these printed -- so I presume the packet drop is elsewhere (I don't know where).

>   _This_ is strange... Could you send me a tcpdump capture of that
> traffic and the matching tproxy debug output?

Will do, in a separate post.

>   I have a few recommendations:
> 
>       * Try to avoid explicitly specifying the foreign (fake) port
>         number at all costs. If you assign a foreign port of zero,
>         connection tracking will select a free port number when applying
>         the NAT mapping. This way you won't have such weird problems.

I agree, I'd love to, but my app isn't able to choose the fake ports it uses -- my only option is detecting errors and dropping the connection if necessary.

>       * Each and every connection _must_ have unique endpoints. When you
>         run two instances of your client, you'll run into a theoretical
>         problem as well: sometimes you try to establish two TCP
>         connections with exactly the same endpoints. This is clearly
>         invalid, and wouldn't be possible without using tproxy, of
>         course.

Yes, you're right.  It is possible to run into this case with the test programs I sent if you wait long enough, but I'm not too worried about this just now as it doesn't appear to result in any more non-NATted traffic.

> > One other curious thing here: MUST_BE_READ_LOCKED(&ip_tproxy_lock) in
> > ip_tproxy_relatedct_add() fails.  Could this be related in any way?
> 
>   Not really, that call is completely bogus IMHO. We probably don't need
> that check there, I'll remove it.

OK.

Food for thought :o).  I'll get back to you with some tcpdumps, etc.

Cheers,

Jim