While I load-test for MAX-OPEN-SESSION, I found a bug that some critical section was not protected by lock. ---------------------------------------------------------------------------- ----------------------- Some Histories. I've used TPROXY for changing source IP address. To increase MAX-OPEN-SESSION, I assign several IP addresses to interface and I manage port number pools per one IP. whenever it bind() before 'setsockopt(IPT_ASSIGN)', it assigns unused IP:PORT pair from pool. And then, I can connect to one server many session that exceed the number of "/proc/sys/net/ipv4/ip_local_port_range". I could succeed to make 200000 connections to one server. But when I close all the session at the same moment, I found the kernel BUG message. ---------------------------------------------------------------------------- ----------------------- TEST Environment CPU : XEON3.0 x 2 (64bit) OS : kernel-2.6.18-1.2679.fc6.src.rpm + cttproxy-2.6.18-2.0.5.tar.gz ETC : used Bridge interface. ---------------------------------------------------------------------------- ----------------------- Kernel BUG message Kernel BUG at include/linux/list.h:167 invalid opcode: 0000 [3] SMP last sysfs file: /class/net/br0/bridge/topology_change_detected CPU 2 Modules linked in: ipt_REDIRECT(U) xt_tcpudp(U) iptable_nat(U) iptable_filter(U) iptable_tproxy(U) ip_nat(U) ip_tables(U) ip_conntrack(U) nfnetlink(U) ipt_TPROXY(U) x _tables(U) ehci_hcd(U) piix(U) usbcore(U) Pid: 1802, comm: heimdall Not tainted 2.6.15-prep #3 RIP: 0010:[<ffffffff8804e1fd>] <ffffffff8804e1fd>{:ip_nat:ip_nat_used_tuple+110} RSP: 0018:ffff81012ba09808 EFLAGS: 00010206 RAX: 000000000000159e RBX: ffff8101153b9aa0 RCX: ffff8101153b9b60 RDX: ffff81011a492ca8 RSI: ffff8101153b9ba8 RDI: ffffffff88045e00 RBP: 0000000000000000 R08: 000000000001a63a R09: 000000003da86da6 R10: 0000000080000000 R11: ffffffff8803ac88 R12: ffff8101153b9be8 R13: ffffffff88059a00 R14: ffff81012ba098bc R15: 0000000000000000 FS: 0000000048294950(0063) GS:ffff81013fc6f940(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aab38305020 CR3: 0000000132497000 CR4: 00000000000006e0 Process heimdall (pid: 1802, threadinfo ffff81012ba08000, task ffff81013c9ce1c0) Stack: ffff50002902a8c0 010626126b00a8c0 0000000000005000 0000000000000001 0000000000001226 ffffffff8804f5f8 ffff81010eebf4f8 ffff8101153b9be8 ffff81012ba098b8 ffffffff88052680 Call Trace: <ffffffff8804f5f8>{:ip_nat:tcp_unique_tuple+247} <ffffffff8804e71d>{:ip_nat:ip_nat_setup_info+796} <ffffffff8805587c>{:iptable_tproxy:ip_tproxy_setup_nat+223} <ffffffff8804e1af>{:ip_nat:ip_nat_used_tuple+32} <ffffffff88055183>{:iptable_tproxy:ip_tproxy_sockref_find_local+39} <ffffffff88055cf0>{:iptable_tproxy:ip_tproxy_fn+575} <ffffffff8804a7e6>{:ip_tables:ipt_do_table+751} <ffffffff8032fe51>{nf_iterate+65} <ffffffff80339b4d>{ip_finish_output+0} <ffffffff803300a1>{nf_hook_slow+88} <ffffffff80339b4d>{ip_finish_output+0} <ffffffff8033b1b1>{ip_output+159} <ffffffff8033aa4d>{ip_queue_xmit+1127} <ffffffff803329cc>{__ip_route_output_key+2134} <ffffffff801488ec>{__alloc_pages+87} <ffffffff80369349>{xfrm_lookup+60} <ffffffff80348e23>{tcp_transmit_skb+1552} <ffffffff8034af19>{tcp_connect+699} <ffffffff8034e291>{tcp_v4_connect+1343} <ffffffff8031a6e7>{lock_sock+175} <ffffffff80358bc9>{inet_stream_connect+148} <ffffffff8033d152>{inet_bind_bucket_create+21} <ffffffff8033ee68>{inet_csk_get_port+492} <ffffffff8031a5a1>{release_sock+19} <ffffffff80319a7e>{sys_connect+118} <ffffffff8033fd43>{tcp_setsockopt+29} <ffffffff80318e29>{sockfd_lookup+12} <ffffffff8031922e>{sys_setsockopt+149} <ffffffff8010a816>{system_call+126} Nov 21 20:29:21 is4 kernel: Code: 0f 0b 68 a2 f9 04 88 c2 a7 00 48 8b 46 b8 48 39 48 08 74 0a rip <ffffffff8804e1fd>{:ip_nat:ip_nat_used_tuple+110} RSP <ffff81012ba09808> ---------------------------------------------------------------------------- ----------------------- I think it happens while kernel is about to use conntrack node that is in the TIME_WAIT state. I found something bad in ip_nat_core.c so I added another lock like this. write_lock_bh(&__ip_nat_lock2); /*** ADD ****/ h = ip_conntrack_tuple_taken(&reply, ignored_conntrack); #if defined(CONFIG_IP_NF_TPROXY) || defined (CONFIG_IP_NF_TPROXY_MODULE) /* check if that conntrack is marked MAY_DELETE, if so, get rid of it... */ if ((h != NULL) && (ctrack = tuplehash_to_ctrack(h)) && test_bit(IPS_MAY_DELETE_BIT, &ctrack->status)) { DEBUGP("Deleting old conntrack entry for NAT\n"); __ip_nat_cleanup_conntrack(ctrack); ctrack->status &= ~IPS_NAT_DONE_MASK; if (del_timer(&ctrack->timeout)) { if (ctrack->timeout.function) { ctrack->timeout.function((unsigned long)ctrack); } } h = NULL; } write_unlock_bh(&__ip_nat_lock2); /*** ADD ****/ #endif And I also added __ip_nat_lock2 in other part that uses "__ip_nat_cleanup_conntrack(ctrack)". It works well until now. ---------------------------------------------------------------------------- ----------------------- Question : Do you have any other TIPs to increase MAX-OPEN-SESSION?