[tproxy] Unprotected critical section in tproxy-patched
ip_nat_core.c
wckwon
wckwon at torinet.co.kr
Fri Dec 1 04:04:25 CET 2006
While I load-test for MAX-OPEN-SESSION,
I found a bug that some critical section was not protected by lock.
----------------------------------------------------------------------------
-----------------------
Some Histories.
I've used TPROXY for changing source IP address.
To increase MAX-OPEN-SESSION, I assign several IP addresses to interface and
I manage
port number pools per one IP.
whenever it bind() before 'setsockopt(IPT_ASSIGN)', it assigns unused
IP:PORT pair from pool.
And then, I can connect to one server many session that exceed the number of
"/proc/sys/net/ipv4/ip_local_port_range". I could succeed to make 200000
connections to one server.
But when I close all the session at the same moment, I found the kernel BUG
message.
----------------------------------------------------------------------------
-----------------------
TEST Environment
CPU : XEON3.0 x 2 (64bit)
OS : kernel-2.6.18-1.2679.fc6.src.rpm + cttproxy-2.6.18-2.0.5.tar.gz
ETC : used Bridge interface.
----------------------------------------------------------------------------
-----------------------
Kernel BUG message
Kernel BUG at include/linux/list.h:167
invalid opcode: 0000 [3] SMP
last sysfs file: /class/net/br0/bridge/topology_change_detected
CPU 2
Modules linked in: ipt_REDIRECT(U) xt_tcpudp(U) iptable_nat(U)
iptable_filter(U) iptable_tproxy(U) ip_nat(U) ip_tables(U) ip_conntrack(U)
nfnetlink(U) ipt_TPROXY(U) x
_tables(U) ehci_hcd(U) piix(U) usbcore(U)
Pid: 1802, comm: heimdall Not tainted 2.6.15-prep #3
RIP: 0010:[<ffffffff8804e1fd>]
<ffffffff8804e1fd>{:ip_nat:ip_nat_used_tuple+110}
RSP: 0018:ffff81012ba09808 EFLAGS: 00010206
RAX: 000000000000159e RBX: ffff8101153b9aa0 RCX: ffff8101153b9b60
RDX: ffff81011a492ca8 RSI: ffff8101153b9ba8 RDI: ffffffff88045e00
RBP: 0000000000000000 R08: 000000000001a63a R09: 000000003da86da6
R10: 0000000080000000 R11: ffffffff8803ac88 R12: ffff8101153b9be8
R13: ffffffff88059a00 R14: ffff81012ba098bc R15: 0000000000000000
FS: 0000000048294950(0063) GS:ffff81013fc6f940(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aab38305020 CR3: 0000000132497000 CR4: 00000000000006e0
Process heimdall (pid: 1802, threadinfo ffff81012ba08000, task
ffff81013c9ce1c0)
Stack: ffff50002902a8c0 010626126b00a8c0 0000000000005000 0000000000000001
0000000000001226 ffffffff8804f5f8 ffff81010eebf4f8 ffff8101153b9be8
ffff81012ba098b8 ffffffff88052680
Call Trace: <ffffffff8804f5f8>{:ip_nat:tcp_unique_tuple+247}
<ffffffff8804e71d>{:ip_nat:ip_nat_setup_info+796}
<ffffffff8805587c>{:iptable_tproxy:ip_tproxy_setup_nat+223}
<ffffffff8804e1af>{:ip_nat:ip_nat_used_tuple+32}
<ffffffff88055183>{:iptable_tproxy:ip_tproxy_sockref_find_local+39}
<ffffffff88055cf0>{:iptable_tproxy:ip_tproxy_fn+575}
<ffffffff8804a7e6>{:ip_tables:ipt_do_table+751}
<ffffffff8032fe51>{nf_iterate+65}
<ffffffff80339b4d>{ip_finish_output+0}
<ffffffff803300a1>{nf_hook_slow+88}
<ffffffff80339b4d>{ip_finish_output+0}
<ffffffff8033b1b1>{ip_output+159}
<ffffffff8033aa4d>{ip_queue_xmit+1127}
<ffffffff803329cc>{__ip_route_output_key+2134}
<ffffffff801488ec>{__alloc_pages+87}
<ffffffff80369349>{xfrm_lookup+60}
<ffffffff80348e23>{tcp_transmit_skb+1552}
<ffffffff8034af19>{tcp_connect+699}
<ffffffff8034e291>{tcp_v4_connect+1343}
<ffffffff8031a6e7>{lock_sock+175}
<ffffffff80358bc9>{inet_stream_connect+148}
<ffffffff8033d152>{inet_bind_bucket_create+21}
<ffffffff8033ee68>{inet_csk_get_port+492}
<ffffffff8031a5a1>{release_sock+19}
<ffffffff80319a7e>{sys_connect+118}
<ffffffff8033fd43>{tcp_setsockopt+29}
<ffffffff80318e29>{sockfd_lookup+12}
<ffffffff8031922e>{sys_setsockopt+149}
<ffffffff8010a816>{system_call+126}
Nov 21 20:29:21 is4 kernel:
Code: 0f 0b 68 a2 f9 04 88 c2 a7 00 48 8b 46 b8 48 39 48 08 74 0a
rip <ffffffff8804e1fd>{:ip_nat:ip_nat_used_tuple+110} RSP <ffff81012ba09808>
----------------------------------------------------------------------------
-----------------------
I think it happens while kernel is about to use conntrack node that is in
the TIME_WAIT state.
I found something bad in ip_nat_core.c so I added another lock like this.
write_lock_bh(&__ip_nat_lock2); /*** ADD ****/
h = ip_conntrack_tuple_taken(&reply, ignored_conntrack);
#if defined(CONFIG_IP_NF_TPROXY) || defined (CONFIG_IP_NF_TPROXY_MODULE)
/* check if that conntrack is marked MAY_DELETE, if so, get rid of
it... */
if ((h != NULL) &&
(ctrack = tuplehash_to_ctrack(h)) &&
test_bit(IPS_MAY_DELETE_BIT, &ctrack->status)) {
DEBUGP("Deleting old conntrack entry for NAT\n");
__ip_nat_cleanup_conntrack(ctrack);
ctrack->status &= ~IPS_NAT_DONE_MASK;
if (del_timer(&ctrack->timeout)) {
if (ctrack->timeout.function) {
ctrack->timeout.function((unsigned long)ctrack);
}
}
h = NULL;
}
write_unlock_bh(&__ip_nat_lock2); /*** ADD ****/
#endif
And I also added __ip_nat_lock2 in other part that uses
"__ip_nat_cleanup_conntrack(ctrack)".
It works well until now.
----------------------------------------------------------------------------
-----------------------
Question : Do you have any other TIPs to increase MAX-OPEN-SESSION?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/tproxy/attachments/20061201/4266306c/attachment-0001.html
More information about the tproxy
mailing list