[tproxy] Unprotected critical section in tproxy-patched ip_nat_core.c

1 Dec 2006

      While I load-test for MAX-OPEN-SESSION, 

I found a bug that some critical section was not protected by lock.

----------------------------------------------------------------------------
-----------------------

Some Histories.

I've used TPROXY for changing source IP address.

To increase MAX-OPEN-SESSION, I assign several IP addresses to interface and
I manage 

port number pools per one IP.

whenever it bind() before 'setsockopt(IPT_ASSIGN)', it assigns unused
IP:PORT pair from pool.

And then, I can connect to one server many session that exceed the number of

"/proc/sys/net/ipv4/ip_local_port_range". I could succeed to make 200000
connections to one server.

But when I close all the session at the same moment, I found the kernel BUG
message.

----------------------------------------------------------------------------
-----------------------

TEST Environment

CPU : XEON3.0 x 2 (64bit)

OS  : kernel-2.6.18-1.2679.fc6.src.rpm + cttproxy-2.6.18-2.0.5.tar.gz

ETC : used Bridge interface.

----------------------------------------------------------------------------
-----------------------

Kernel BUG message

Kernel BUG at include/linux/list.h:167

invalid opcode: 0000 [3] SMP

last sysfs file: /class/net/br0/bridge/topology_change_detected

CPU 2

Modules linked in: ipt_REDIRECT(U) xt_tcpudp(U) iptable_nat(U)
iptable_filter(U) iptable_tproxy(U) ip_nat(U) ip_tables(U) ip_conntrack(U)
nfnetlink(U) ipt_TPROXY(U) x

_tables(U) ehci_hcd(U) piix(U) usbcore(U)

Pid: 1802, comm: heimdall Not tainted 2.6.15-prep #3

RIP: 0010:[<ffffffff8804e1fd>]
<ffffffff8804e1fd>{:ip_nat:ip_nat_used_tuple+110}

RSP: 0018:ffff81012ba09808  EFLAGS: 00010206

RAX: 000000000000159e RBX: ffff8101153b9aa0 RCX: ffff8101153b9b60

RDX: ffff81011a492ca8 RSI: ffff8101153b9ba8 RDI: ffffffff88045e00

RBP: 0000000000000000 R08: 000000000001a63a R09: 000000003da86da6

R10: 0000000080000000 R11: ffffffff8803ac88 R12: ffff8101153b9be8

R13: ffffffff88059a00 R14: ffff81012ba098bc R15: 0000000000000000

FS:  0000000048294950(0063) GS:ffff81013fc6f940(0000) knlGS:0000000000000000

CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: 00002aab38305020 CR3: 0000000132497000 CR4: 00000000000006e0

Process heimdall (pid: 1802, threadinfo ffff81012ba08000, task
ffff81013c9ce1c0)

Stack: ffff50002902a8c0 010626126b00a8c0 0000000000005000 0000000000000001

       0000000000001226 ffffffff8804f5f8 ffff81010eebf4f8 ffff8101153b9be8

       ffff81012ba098b8 ffffffff88052680

Call Trace: <ffffffff8804f5f8>{:ip_nat:tcp_unique_tuple+247}

       <ffffffff8804e71d>{:ip_nat:ip_nat_setup_info+796}
<ffffffff8805587c>{:iptable_tproxy:ip_tproxy_setup_nat+223}

       <ffffffff8804e1af>{:ip_nat:ip_nat_used_tuple+32}
<ffffffff88055183>{:iptable_tproxy:ip_tproxy_sockref_find_local+39}

       <ffffffff88055cf0>{:iptable_tproxy:ip_tproxy_fn+575}

       <ffffffff8804a7e6>{:ip_tables:ipt_do_table+751}
<ffffffff8032fe51>{nf_iterate+65}

       <ffffffff80339b4d>{ip_finish_output+0}
<ffffffff803300a1>{nf_hook_slow+88}

       <ffffffff80339b4d>{ip_finish_output+0}
<ffffffff8033b1b1>{ip_output+159}

       <ffffffff8033aa4d>{ip_queue_xmit+1127}
<ffffffff803329cc>{__ip_route_output_key+2134}

       <ffffffff801488ec>{__alloc_pages+87}
<ffffffff80369349>{xfrm_lookup+60}

       <ffffffff80348e23>{tcp_transmit_skb+1552}
<ffffffff8034af19>{tcp_connect+699}

       <ffffffff8034e291>{tcp_v4_connect+1343}
<ffffffff8031a6e7>{lock_sock+175}

       <ffffffff80358bc9>{inet_stream_connect+148}
<ffffffff8033d152>{inet_bind_bucket_create+21}

       <ffffffff8033ee68>{inet_csk_get_port+492}
<ffffffff8031a5a1>{release_sock+19}

       <ffffffff80319a7e>{sys_connect+118}
<ffffffff8033fd43>{tcp_setsockopt+29}

       <ffffffff80318e29>{sockfd_lookup+12}
<ffffffff8031922e>{sys_setsockopt+149}

       <ffffffff8010a816>{system_call+126}

Nov 21 20:29:21 is4 kernel:

Code: 0f 0b 68 a2 f9 04 88 c2 a7 00 48 8b 46 b8 48 39 48 08 74 0a

rip <ffffffff8804e1fd>{:ip_nat:ip_nat_used_tuple+110} RSP <ffff81012ba09808>

----------------------------------------------------------------------------
-----------------------

I think it happens while kernel is about to use conntrack node that is in
the TIME_WAIT state.

I found something bad in ip_nat_core.c so I added another lock like this.

        write_lock_bh(&__ip_nat_lock2); /*** ADD ****/

        h = ip_conntrack_tuple_taken(&reply, ignored_conntrack);

#if defined(CONFIG_IP_NF_TPROXY) || defined (CONFIG_IP_NF_TPROXY_MODULE)

        /* check if that conntrack is marked MAY_DELETE, if so, get rid of
it... */

        if ((h != NULL) &&

            (ctrack = tuplehash_to_ctrack(h)) &&

            test_bit(IPS_MAY_DELETE_BIT, &ctrack->status)) {

                DEBUGP("Deleting old conntrack entry for NAT\n");

                __ip_nat_cleanup_conntrack(ctrack);

                ctrack->status &= ~IPS_NAT_DONE_MASK;

                if (del_timer(&ctrack->timeout)) {

            if (ctrack->timeout.function) {

                        ctrack->timeout.function((unsigned long)ctrack);

            }

        }

                h = NULL;

        }

        write_unlock_bh(&__ip_nat_lock2); /*** ADD ****/

#endif

And I also added __ip_nat_lock2 in other part that uses
"__ip_nat_cleanup_conntrack(ctrack)".

It works well until now.

----------------------------------------------------------------------------
-----------------------

Question : Do you have any other TIPs to increase MAX-OPEN-SESSION?

[tproxy] Unprotected critical section in tproxy-patched ip_nat_core.c

wckwon