[zorp] Zorp 2.1.5.5 can't handle load

Sheldon Hearn zorp@lists.balabit.hu
Thu, 20 May 2004 22:51:11 +0200


Hi folks,

I'm worried.  I'm in a situation where I've put a proxy cluster into
production without adequately testing the Zorp component under load.

I spent a lot of time testing load balancing, but didn't check that Zorp
could cope with a large number of concurrent connections.

We're running zorp-2.1.5.5 on a Linux 2.4.25 (Gentoo) kernel with
glibc-2.3.2 (Gentoo r9).

The http proxy dies with sig11 (all registers printed zero in the stack
dump sent to syslog) when it reaches some small number of concurrent
threads over 130.

So we tried using just the TCP plug proxy, even for HTTP connections,
but can't get a single instance using more than 1020 threads.

We have 4 zorp boxes handling a 100Mbps uplink, load-balanced with LVS. 
LVS ipvsadm also shows that the Zorp boxes aren't handling more than
about 1000 concurrent connections.

The visible symptom of all this is that some connection attempts aren't
even accepted, while others are accepted but not serviced.

I've done a lot of Googling, and all the stuff on how to increase the
number of processes allowed per process doesn't seem to apply;
PTHREAD_THREADS_MAX is already large in the glibc sources, and NR_TASKS
doesn't exist in the kernel source.

I've bumped up ulimits for file descriptors and processes per user, but
these don't help.

Help.  I realise I went into production prematurely, but now that I'm
here, it's a horrible place and I'm worried that I overestimated Zorp's
ability to cope with load.  Am I expecting too much from Zorp, or is
this just something that more experienced Linux folks would know about?

Any ideas on how to get Zorp to handle the kind of concurrency other
people on the list must be getting[1] would be greatly appreciated.  

Either I need to get Zorp to service a larger number of concurrent
requests, or I need to know why it's not coping when it reaches the
limit on concurrent requests.  I tried lowering --threads to 200, but my
connection attempts still either aren't accepted or time out waiting for
a response.

Thanks,
Sheldon.

[1] I base this assumption on a posting in the archives, where the
poster said he needed about 4 zorp proxy hosts to handle 100Mbps.