Re: [tproxy] tproxy race condition? [RESEND]

16 Dec 2004

      Hi Jim,

2004-12-15, sze keltezéssel 12:09-kor jim@minter.demon.co.uk ezt írta:
...
I'm trying to use tproxy to implement a fully transparent layer 7 proxy as
follows: TCP connections arrive and are REDIRECTed to a single local port.
A userspace process listen()s on that port, and makes ongoing (transparent)
connections on new TCP sockets by calling bind(), tproxy setsockopt()s and
connect().  In general it works well, but I'm having a few issues which I
think are possibly SMP-related.  I believe I've reduced some of these to a
simple test case, sources for which are attached.  I'm using Linux kernel
2.4.27 and all four patches in cttproxy-2.4.27-2.0.0 patch.  To run the
test case, you need two machines; I think the 'client' must be SMP.
OK, so everything in my reply is pure theory, I did not test the
samples (yet).
...
The 'server', 10.0.3.2, listens on a single TCP port and has a simple loop
which accept()s and close()s TCP connections that it receives.
The SMP 'client', 10.0.3.3, has two processes each connecting to the
server.  The clients loop through a port range 32768-49152.  They bind() on
10.0.3.3, receiving some port from the kernel.  They then assign a
transparent port in the loop port range on unregistered IP 10.0.3.253, and
connect() to the server.  (The server has a route set up so that it knows
to return traffic on 10.0.3.253 to the client box).
The problem: once in a while, one of the client processes takes 3s to
connect() to the server.  Then, the resulting TCP connection is NOT
TRANSPARENT (i.e. 10.0.3.3 is used, not 10.0.3.253).  This can be seen by
running "tcpdump host 10.0.3.3" on either box.  However, none of the client
process system calls fail at any point.
Unfortunately handling errors is the most problematic part of tproxy.
The difficulty lies in the fact that when the setsockopt() calls return,
we have no way of knowing if the not-yet-established connection will
clash with another connection in the conntrack hash or not. This is
because the connection won't be created until the first packet leaves
the machine, which is shortly after you call connect(). If the tproxy
Netfilter hook detects that it cannot apply a NAT mapping, it just drops
the packet (and probably the conntrack entry as well) since it has no
way of notifying the user-space process.

  If you start two client processes, you'll have a good chance of trying
to assign "colliding" foreign addresses. If you set REUSEADDR, tproxy
will allow you to assign the same foreign address more than once, since
you've explicitly requested to do so by setting REUSEADDR (let's assume
you've chosen port x). However, as soon as you try to use them, you'll
experience problems, since the reply tuples of the connections would be
the same. Of course connection tracking won't allow this, so trying to
apply the NAT mapping will fail for one of the client processes. (I
don't know yet why the packets leave the machine with an unmodified
source IP, in theory they should be dropped, or at least NAT-ted to the
wrong source port number...)
...
In the case that CONFIG_IP_NF_NAT_NRES is set, at the same time this
happens, the _other process_ has a -EINVAL failure in
ip_tproxy_setsockopt_flags(), with corresponding "failed to register NAT
reservation" error in dmesg.  When CONFIG_IP_NF_NAT_NRES is unset, this
failure doesn't happen.  But either way, on the _original process_, the
non-transparent TCP connection happens.
NAT reservations make it possible for tproxy to fail early. If NAT
reservations are enabled, tproxy registers "reservations" for foreign
addresses to be used later. If such a registration fails, that means
that the foreign address is already reserved for some other connection.
This is why in that case even the setsockopt() call fails. This is good,
since it provides you a way of detecting the error.
...
What I believe is happening is as follows: There is evidence in dmesg that
the first SYN packet of the connect() passes through the LOCAL_OUT iptables
hooks (I see "ip_tproxy_fn(): new connection, hook=3" and "ip_tproxy_fn():
new connection, hook=4", but for some reason the packet never actually
makes it onto the wire.
Don't you have any kind of errors in the kernel logs when this
happens? Tproxy could drop the packet, but you should get an error
message in that case.
...
I can't see where it goes missing.  But anyway,
connect() waits 3s and resends the SYN.  This time, as the second packet
goes through the iptables, for some reason it's not translated.  It makes
it onto the wire and the rest of the connection proceeds untranslated.
_This_ is strange... Could you send me a tcpdump capture of that
traffic and the matching tproxy debug output?
...
I haven't been able to progress much further debugging this, and wondered
if you had any ideas?  My principal concern is that the userspace processes
don't receive an error and have no proper way of telling that the
connection is going untransparent.  Am I making a stupid mistake somewhere?
I have a few recommendations:

      * Try to avoid explicitly specifying the foreign (fake) port
        number at all costs. If you assign a foreign port of zero,
        connection tracking will select a free port number when applying
        the NAT mapping. This way you won't have such weird problems.
      * Each and every connection _must_ have unique endpoints. When you
        run two instances of your client, you'll run into a theoretical
        problem as well: sometimes you try to establish two TCP
        connections with exactly the same endpoints. This is clearly
        invalid, and wouldn't be possible without using tproxy, of
        course.
...
One other curious thing here: MUST_BE_READ_LOCKED(&ip_tproxy_lock) in
ip_tproxy_relatedct_add() fails.  Could this be related in any way?
Not really, that call is completely bogus IMHO. We probably don't need
that check there, I'll remove it.
...
Finally, what is the purpose of the new CONFIG_IP_NF_NAT_NRES option?
See above. :)

-- 
 Regards,
   Krisztian KOVACS

Re: [tproxy] tproxy race condition? [RESEND]

KOVACS Krisztian