Re: [tproxy] tproxy race condition? [RESEND]

17 Dec 2004

      Hi Krisztian!

Many thanks for your reply.

hidden@balabit.hu wrote:

<reply snipped in places>
...
Unfortunately handling errors is the most problematic part of tproxy.
The difficulty lies in the fact that when the setsockopt() calls return,
we have no way of knowing if the not-yet-established connection will
clash with another connection in the conntrack hash or not. This is
because the connection won't be created until the first packet leaves
the machine, which is shortly after you call connect(). If the tproxy
Netfilter hook detects that it cannot apply a NAT mapping, it just drops
the packet (and probably the conntrack entry as well) since it has no
way of notifying the user-space process.
Agreed.  It's not possible to pre-add the mapping to the conntrack table at the setsockopt() stage, I take it.

I'll be keen to move to using NAT reservations as evidently it will help me in the long run -- it's just that as this bug shows up with and without them, at this stage I'm not using them, for simplicity.
...
If you start two client processes, you'll have a good chance of trying
to assign "colliding" foreign addresses. If you set REUSEADDR, tproxy
will allow you to assign the same foreign address more than once, since
you've explicitly requested to do so by setting REUSEADDR (let's assume
you've chosen port x). However, as soon as you try to use them, you'll
experience problems, since the reply tuples of the connections would be
the same. Of course connection tracking won't allow this, so trying to
apply the NAT mapping will fail for one of the client processes. (I
don't know yet why the packets leave the machine with an unmodified
source IP, in theory they should be dropped, or at least NAT-ted to the
wrong source port number...)
Fair enough.  If I adjust the program such that one process asks tproxy to assign odd numbered foreign ports, and the other process even numbered foreign ports, the problem still happens just as quickly -- so it's not a simple collision fault!

As an aside, the Linux TCP/IP stack allows a single IP address to make >65,536 TCP connections at once.  It does this by allowing >1 sockets to share the same local port [in the auto-bind code called by TCP connect()], as long as they're connecting to different remote end-points.  The return packets are demultiplexed by remote end-point as well as the local one.  Additionally, some OSes even allow the user to pre-bind sockets to a local port of _their choice_ before making a connect(), easily allowing >1 connections at once per local port!
...
...
What I believe is happening is as follows: There is evidence in dmesg that
the first SYN packet of the connect() passes through the LOCAL_OUT iptables
hooks (I see "ip_tproxy_fn(): new connection, hook=3" and "ip_tproxy_fn():
new connection, hook=4", but for some reason the packet never actually
makes it onto the wire.
Don't you have any kind of errors in the kernel logs when this
happens? Tproxy could drop the packet, but you should get an error
message in that case.
No errors at all :o(.  The curious thing is that I added extra printk's to all the cases in the tproxy code where I could see "return NF_DROP" (or equivalent), and none of these printed -- so I presume the packet drop is elsewhere (I don't know where).
...
_This_ is strange... Could you send me a tcpdump capture of that
traffic and the matching tproxy debug output?
Will do, in a separate post.
...
I have a few recommendations:
* Try to avoid explicitly specifying the foreign (fake) port
        number at all costs. If you assign a foreign port of zero,
        connection tracking will select a free port number when applying
        the NAT mapping. This way you won't have such weird problems.
I agree, I'd love to, but my app isn't able to choose the fake ports it uses -- my only option is detecting errors and dropping the connection if necessary.
...
* Each and every connection _must_ have unique endpoints. When you
        run two instances of your client, you'll run into a theoretical
        problem as well: sometimes you try to establish two TCP
        connections with exactly the same endpoints. This is clearly
        invalid, and wouldn't be possible without using tproxy, of
        course.
Yes, you're right.  It is possible to run into this case with the test programs I sent if you wait long enough, but I'm not too worried about this just now as it doesn't appear to result in any more non-NATted traffic.
...
...
One other curious thing here: MUST_BE_READ_LOCKED(&ip_tproxy_lock) in
ip_tproxy_relatedct_add() fails.  Could this be related in any way?
Not really, that call is completely bogus IMHO. We probably don't need
that check there, I'll remove it.
OK.

Food for thought :o).  I'll get back to you with some tcpdumps, etc.

Cheers,

Jim

Re: [tproxy] tproxy race condition? [RESEND]

jim＠minter.demon.co.uk