While I load-test for MAX-OPEN-SESSION,
I found a bug that some critical section was not protected by
lock.
---------------------------------------------------------------------------------------------------
Some Histories.
I've used TPROXY for changing source IP address.
To increase MAX-OPEN-SESSION, I assign several IP addresses to
interface and I manage
port number pools per one IP.
whenever it bind() before 'setsockopt(IPT_ASSIGN)', it assigns
unused IP:PORT pair from pool.
And then, I can connect to one server many session that exceed
the number of
"/proc/sys/net/ipv4/ip_local_port_range". I could
succeed to make 200000 connections to one server.
But when I close all the session at the same moment, I found
the kernel BUG message.
---------------------------------------------------------------------------------------------------
TEST Environment
CPU : XEON3.0 x 2 (64bit)
OS : kernel-2.6.18-1.2679.fc6.src.rpm +
cttproxy-2.6.18-2.0.5.tar.gz
ETC : used Bridge interface.
---------------------------------------------------------------------------------------------------
Kernel BUG message
Kernel BUG at include/linux/list.h:167
invalid opcode: 0000 [3] SMP
last sysfs file: /class/net/br0/bridge/topology_change_detected
CPU 2
Modules linked in: ipt_REDIRECT(U) xt_tcpudp(U) iptable_nat(U)
iptable_filter(U) iptable_tproxy(U) ip_nat(U) ip_tables(U) ip_conntrack(U)
nfnetlink(U) ipt_TPROXY(U) x
_tables(U) ehci_hcd(U) piix(U) usbcore(U)
Pid: 1802, comm: heimdall Not tainted 2.6.15-prep #3
RIP: 0010:[<ffffffff8804e1fd>]
<ffffffff8804e1fd>{:ip_nat:ip_nat_used_tuple+110}
RSP: 0018:ffff81012ba09808 EFLAGS: 00010206
RAX: 000000000000159e RBX: ffff8101153b9aa0 RCX:
ffff8101153b9b60
RDX: ffff81011a492ca8 RSI: ffff8101153b9ba8 RDI: ffffffff88045e00
RBP: 0000000000000000 R08: 000000000001a63a R09:
000000003da86da6
R10: 0000000080000000 R11: ffffffff8803ac88 R12:
ffff8101153b9be8
R13: ffffffff88059a00 R14: ffff81012ba098bc R15:
0000000000000000
FS: 0000000048294950(0063) GS:ffff81013fc6f940(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aab38305020 CR3: 0000000132497000 CR4:
00000000000006e0
Process heimdall (pid: 1802, threadinfo ffff81012ba08000, task ffff81013c9ce1c0)
Stack: ffff50002902a8c0 010626126b00a8c0 0000000000005000
0000000000000001
0000000000001226
ffffffff8804f5f8 ffff81010eebf4f8 ffff8101153b9be8
ffff81012ba098b8
ffffffff88052680
Call Trace: <ffffffff8804f5f8>{:ip_nat:tcp_unique_tuple+247}
<ffffffff8804e71d>{:ip_nat:ip_nat_setup_info+796}
<ffffffff8805587c>{:iptable_tproxy:ip_tproxy_setup_nat+223}
<ffffffff8804e1af>{:ip_nat:ip_nat_used_tuple+32}
<ffffffff88055183>{:iptable_tproxy:ip_tproxy_sockref_find_local+39}
<ffffffff88055cf0>{:iptable_tproxy:ip_tproxy_fn+575}
<ffffffff8804a7e6>{:ip_tables:ipt_do_table+751}
<ffffffff8032fe51>{nf_iterate+65}
<ffffffff80339b4d>{ip_finish_output+0}
<ffffffff803300a1>{nf_hook_slow+88}
<ffffffff80339b4d>{ip_finish_output+0}
<ffffffff8033b1b1>{ip_output+159}
<ffffffff8033aa4d>{ip_queue_xmit+1127}
<ffffffff803329cc>{__ip_route_output_key+2134}
<ffffffff801488ec>{__alloc_pages+87}
<ffffffff80369349>{xfrm_lookup+60}
<ffffffff80348e23>{tcp_transmit_skb+1552}
<ffffffff8034af19>{tcp_connect+699}
<ffffffff8034e291>{tcp_v4_connect+1343}
<ffffffff8031a6e7>{lock_sock+175}
<ffffffff80358bc9>{inet_stream_connect+148}
<ffffffff8033d152>{inet_bind_bucket_create+21}
<ffffffff8033ee68>{inet_csk_get_port+492}
<ffffffff8031a5a1>{release_sock+19}
<ffffffff80319a7e>{sys_connect+118}
<ffffffff8033fd43>{tcp_setsockopt+29}
<ffffffff80318e29>{sockfd_lookup+12}
<ffffffff8031922e>{sys_setsockopt+149}
<ffffffff8010a816>{system_call+126}
Nov 21
Code: 0f 0b 68 a2 f9 04 88 c2 a7 00 48 8b 46 b8 48 39 48 08 74
0a
rip <ffffffff8804e1fd>{:ip_nat:ip_nat_used_tuple+110} RSP
<ffff81012ba09808>
---------------------------------------------------------------------------------------------------
I think it happens while kernel is about to use conntrack node
that is in the TIME_WAIT state.
I found something bad in ip_nat_core.c so I added another lock
like this.
write_lock_bh(&__ip_nat_lock2); /*** ADD ****/
h =
ip_conntrack_tuple_taken(&reply, ignored_conntrack);
#if defined(CONFIG_IP_NF_TPROXY) || defined
(CONFIG_IP_NF_TPROXY_MODULE)
/* check if that
conntrack is marked MAY_DELETE, if so, get rid of it... */
if ((h != NULL)
&&
(ctrack
= tuplehash_to_ctrack(h)) &&
test_bit(IPS_MAY_DELETE_BIT, &ctrack->status)) {
DEBUGP("Deleting old conntrack entry for NAT\n");
__ip_nat_cleanup_conntrack(ctrack);
ctrack->status &= ~IPS_NAT_DONE_MASK;
if (del_timer(&ctrack->timeout)) {
if (ctrack->timeout.function) {
ctrack->timeout.function((unsigned long)ctrack);
}
}
h = NULL;
}
write_unlock_bh(&__ip_nat_lock2);
/*** ADD ****/
#endif
And I also added __ip_nat_lock2 in other part that uses
"__ip_nat_cleanup_conntrack(ctrack)".
It works well until now.
---------------------------------------------------------------------------------------------------
Question : Do you have any other TIPs to increase
MAX-OPEN-SESSION?