Hi Jim, 2004-12-15, sze keltezéssel 12:09-kor jim@minter.demon.co.uk ezt írta:
I'm trying to use tproxy to implement a fully transparent layer 7 proxy as follows: TCP connections arrive and are REDIRECTed to a single local port. A userspace process listen()s on that port, and makes ongoing (transparent) connections on new TCP sockets by calling bind(), tproxy setsockopt()s and connect(). In general it works well, but I'm having a few issues which I think are possibly SMP-related. I believe I've reduced some of these to a simple test case, sources for which are attached. I'm using Linux kernel 2.4.27 and all four patches in cttproxy-2.4.27-2.0.0 patch. To run the test case, you need two machines; I think the 'client' must be SMP.
OK, so everything in my reply is pure theory, I did not test the samples (yet).
The 'server', 10.0.3.2, listens on a single TCP port and has a simple loop which accept()s and close()s TCP connections that it receives.
The SMP 'client', 10.0.3.3, has two processes each connecting to the server. The clients loop through a port range 32768-49152. They bind() on 10.0.3.3, receiving some port from the kernel. They then assign a transparent port in the loop port range on unregistered IP 10.0.3.253, and connect() to the server. (The server has a route set up so that it knows to return traffic on 10.0.3.253 to the client box).
The problem: once in a while, one of the client processes takes 3s to connect() to the server. Then, the resulting TCP connection is NOT TRANSPARENT (i.e. 10.0.3.3 is used, not 10.0.3.253). This can be seen by running "tcpdump host 10.0.3.3" on either box. However, none of the client process system calls fail at any point.
Unfortunately handling errors is the most problematic part of tproxy. The difficulty lies in the fact that when the setsockopt() calls return, we have no way of knowing if the not-yet-established connection will clash with another connection in the conntrack hash or not. This is because the connection won't be created until the first packet leaves the machine, which is shortly after you call connect(). If the tproxy Netfilter hook detects that it cannot apply a NAT mapping, it just drops the packet (and probably the conntrack entry as well) since it has no way of notifying the user-space process. If you start two client processes, you'll have a good chance of trying to assign "colliding" foreign addresses. If you set REUSEADDR, tproxy will allow you to assign the same foreign address more than once, since you've explicitly requested to do so by setting REUSEADDR (let's assume you've chosen port x). However, as soon as you try to use them, you'll experience problems, since the reply tuples of the connections would be the same. Of course connection tracking won't allow this, so trying to apply the NAT mapping will fail for one of the client processes. (I don't know yet why the packets leave the machine with an unmodified source IP, in theory they should be dropped, or at least NAT-ted to the wrong source port number...)
In the case that CONFIG_IP_NF_NAT_NRES is set, at the same time this happens, the _other process_ has a -EINVAL failure in ip_tproxy_setsockopt_flags(), with corresponding "failed to register NAT reservation" error in dmesg. When CONFIG_IP_NF_NAT_NRES is unset, this failure doesn't happen. But either way, on the _original process_, the non-transparent TCP connection happens.
NAT reservations make it possible for tproxy to fail early. If NAT reservations are enabled, tproxy registers "reservations" for foreign addresses to be used later. If such a registration fails, that means that the foreign address is already reserved for some other connection. This is why in that case even the setsockopt() call fails. This is good, since it provides you a way of detecting the error.
What I believe is happening is as follows: There is evidence in dmesg that the first SYN packet of the connect() passes through the LOCAL_OUT iptables hooks (I see "ip_tproxy_fn(): new connection, hook=3" and "ip_tproxy_fn(): new connection, hook=4", but for some reason the packet never actually makes it onto the wire.
Don't you have any kind of errors in the kernel logs when this happens? Tproxy could drop the packet, but you should get an error message in that case.
I can't see where it goes missing. But anyway, connect() waits 3s and resends the SYN. This time, as the second packet goes through the iptables, for some reason it's not translated. It makes it onto the wire and the rest of the connection proceeds untranslated.
_This_ is strange... Could you send me a tcpdump capture of that traffic and the matching tproxy debug output?
I haven't been able to progress much further debugging this, and wondered if you had any ideas? My principal concern is that the userspace processes don't receive an error and have no proper way of telling that the connection is going untransparent. Am I making a stupid mistake somewhere?
I have a few recommendations: * Try to avoid explicitly specifying the foreign (fake) port number at all costs. If you assign a foreign port of zero, connection tracking will select a free port number when applying the NAT mapping. This way you won't have such weird problems. * Each and every connection _must_ have unique endpoints. When you run two instances of your client, you'll run into a theoretical problem as well: sometimes you try to establish two TCP connections with exactly the same endpoints. This is clearly invalid, and wouldn't be possible without using tproxy, of course.
One other curious thing here: MUST_BE_READ_LOCKED(&ip_tproxy_lock) in ip_tproxy_relatedct_add() fails. Could this be related in any way?
Not really, that call is completely bogus IMHO. We probably don't need that check there, I'll remove it.
Finally, what is the purpose of the new CONFIG_IP_NF_NAT_NRES option?
See above. :) -- Regards, Krisztian KOVACS