Hi Jim, 2004-12-17, p keltezéssel 15:19-kor jim@minter.demon.co.uk ezt írta:
Fair enough. If I adjust the program such that one process asks tproxy to assign odd numbered foreign ports, and the other process even numbered foreign ports, the problem still happens just as quickly -- so it's not a simple collision fault!
As an aside, the Linux TCP/IP stack allows a single IP address to make
65,536 TCP connections at once. It does this by allowing >1 sockets to share the same local port [in the auto-bind code called by TCP connect()], as long as they're connecting to different remote end-points. The return packets are demultiplexed by remote end-point as well as the local one. Additionally, some OSes even allow the user to pre-bind sockets to a local port of _their choice_ before making a connect(), easily allowing >1 connections at once per local port!
Of course, this is clear. This is what REUSEADDR was invented for, and tproxy allows you to assign the same foreign address to multiple sockets as well (of course with some restricitions). Unfortunately in the IP stack of the kernel these things are much more simple: if you set REUSEADDR, you're allowed to bind() to an address already taken. However, you'll get an error when trying to connect() to the same destination host. In case of tproxy this is much more difficult, since you won't be able to detect clashes before it's too late. (Without NAT reservations.)
What I believe is happening is as follows: There is evidence in dmesg that the first SYN packet of the connect() passes through the LOCAL_OUT iptables hooks (I see "ip_tproxy_fn(): new connection, hook=3" and "ip_tproxy_fn(): new connection, hook=4", but for some reason the packet never actually makes it onto the wire.
Don't you have any kind of errors in the kernel logs when this happens? Tproxy could drop the packet, but you should get an error message in that case.
No errors at all :o(. The curious thing is that I added extra printk's to all the cases in the tproxy code where I could see "return NF_DROP" (or equivalent), and none of these printed -- so I presume the packet drop is elsewhere (I don't know where).
Ok, do you have any DNAT/MASQUERADE rules in your iptables config? Or what kind of NAT rulese do you use? Another shortcoming of the NAT-based operation of tproxy is the following: you have to make sure that you do not reuse the _local_ address before the conntrack entry of the previous connection from that address times out. So, if you make a lot of connections from the same IP, and the local autobind port range is not enough for you, you'll have to use additional local IP addresses as well. (Note that these do not need to be routable IP addresses.) For example if you make 400 short-lived connections per second, and have configured the local port range to contain 50000 ports, it will take 125 seconds for the port range to turn over. The timeout of conntrack entries in TIME_WAIT state is 120 seconds, so with 400 cps you're already likely to have problems.
_This_ is strange... Could you send me a tcpdump capture of that traffic and the matching tproxy debug output?
Will do, in a separate post.
I have a few recommendations:
* Try to avoid explicitly specifying the foreign (fake) port number at all costs. If you assign a foreign port of zero, connection tracking will select a free port number when applying the NAT mapping. This way you won't have such weird problems.
I agree, I'd love to, but my app isn't able to choose the fake ports it uses -- my only option is detecting errors and dropping the connection if necessary.
You're right, unfortunately there are cases when this is not an option. -- Regards, Krisztian KOVACS