[zorp] Stacking programs doesn’t work and how to modify POST parameters?

Mon Jul 7 09:58:55 CEST 2008

On Fri, 2008-07-04 at 19:58 +0200, thomas.wenz at gmx-topmail.de wrote:
> Hi,
> 
> I was using the kernel from the latest ZorpOS which worked for 3.14. 
> I've compiled a new kernel with kernel-patchtree-2.6.17-zorpos-4.1.4. This 
> loads up TProxy 4.0 and now it works! The old kernel had TProxy 3.0 so this
> was the problem.

Great. Zorp 3.3 should also support the old tproxy, but probably you'd
have to override the default using the "--tproxy tproxy30" command line
option. But tproxy4 is better anyway.

And as a plus you got kzorp as well.

> 
> > The "GET" request has no data payload, that's why it is not stacking
> > anything. For POST it should start the stacked program before connecting
> > to the destination.
> 
> From my limited point of view, it looks like it's after the connect (see log 
> below). Do you know a possibility how to move it up so that it's called before
> "Filtering request and headers;"? Or can I alternatively perform own 
> changes in the "Request postfilter header;"-section (meaning: moving the
> "Filtering request and headers;" section down)?

You are right indeed. The server side connection is established right
before the stacking starts. The reason is that by default the POST data
is not buffered in memory, it is copied in stream mode.

E.g. whenever a new POST packet comes in, it is copied to the stacked
proxy straight, which then copies it to the server without keeping it in
a buffer. (POST data can be quite large)

It should be possible to delay the connection even further by moving the
http_connect_server() call from http_copy_request() to
http_transfer_dst_write_preamble(). Something like:

http_transfer_dst_write_preamble(HttpTransfer *self, ZStream *stream, GError **err)
{
  GIOStatus res = G_IO_STATUS_NORMAL;
  GError *local_error = NULL;
  gsize bw;

  if (self->preamble_ofs == 0)
    {
      if (!http_connect_server(self))
        {
          g_set_error(err);
          return G_IO_STATUS_ERROR;
        }

      http_log_headers((HttpProxy *) self->super.owner, self->transfer_from, "postfilter");
    }

The only problem is that the data transfer subsystem assumes that 
the server side stream already exists when it is started. It is not 
impossible to lift this assumption though. (this can be seen in 
libproxy/transfer.c, z_transfer2_start, it binds callbacks to 
various I/O events).

Hmm, hmm.. Now as I think of it, there's another possibility, which is somewhat simpler. 
The HTTP proxy is capable of reading the entire POST data before connecting to the 
server at all and storing it in a blob (more about blobs later). In this case the 
entire request together with the POST payload is fetched into memory, and 
after filtering it is sent to the server.

This code path is activated if you set "self.rerequest_attempts" to something greater 
than zero. So if you set rerequest_attempts, your stacked proxy will start earlier 
than the server side connection is established.

So the promised explanation of blobs. A blob is a store for potentially large chunks 
of data with administrator controlled memory and disk use limits. When memory is still
available, this data is stored in memory. When memory constraints become tight, data 
is swapped out into files.

Ok, then you'll need to enable stacking for GET requests. The function that decides 
whether to stack a proxy lives here: http_transfer_stack_proxy(), and here's the 
condition you are looking for:

  /* we don't stack anything, if
     1) we suppress data
     2) data is not indicated by either the presence of some header fields nor do we expect data
  */
  if (self->suppress_data ||  
      !(self->expect_data || http_lookup_header(headers, "Transfer-Encoding", &hdr) || http_lookup_header(headers, "Content-Length", &hdr)))
    {
      *stacked = NULL;
      return TRUE;
    }

In the case of "GET", suppress_data is FALSE, expect_data is FALSE and there are no 
"Transfer-Encoding" or "Content-Length" headers in the input. As you want to send 
headers to the downstream proxy anyway, you enable "self.push_mime_headers", so if 
you change the above condition to:

  if (self->suppress_data ||  
      !(self->expect_data || self->push_mime_headers || http_lookup_header(headers, "Transfer-Encoding", &hdr) || http_lookup_header(headers, "Content-Length", &hdr)))
                             ^^^^^^^^^^^^^^^^^^^^^^^

This will trigger stacking and start your program as a child proxy, which also receives the
MIME headers (e.g. the HTTP headers with Content-Length/Transfer-Encoding/Connection removed)

> Is there also a possibility to activate it for GET and emtpy requests(in fact, 
> if a tamper a POST so that it contains no payload it's also not handled over)? 
> I think I need to remove some checks in the C-code to achieve this, correct?

A proxy is stacked if there are headers to indicate that there's data.
The stacking feature was meant to be used on HTTP payload on on HTTP
requests/responses.

> 
> I actually need the aboce for the following scenario:
> 1. The whole request is handled over to an external program no matter what 
>     it contains. (I first tried it with an AnyPy in front but I couldn't stack http on 
>     it...)

Hmm.. you can side-stack HTTP beside an AnyPy, like this:

client -> AnyPy -> Http -> server

You cannot stack HTTP _into_ AnyPy.

> 2. The external program decides what needs to be changed (headers and data) and
>     logs the whole requests.

Hmm.. stacking was meant to be used for data inspection (virus & spam
checking) in the case of HTTP. Although the headers can be sent to the
child proxy (using the push_mime_headers option), if they are changed
those are not sent to the server.

Do you really want to change headers? If you do, isn't the power of the
Python binding enough?

E.g.

def config(self):
   ....
   self.request["GET"] = (HTTP_REQ_POLICY, self.checkHeaders)

def checkHeaders(self, method, url, version):
   if self.getRequestHeader("X-Interesting-Http-Header"):
       self.setRequestHeader("X-Interesting-Http-Header", "new-value")

It might be difficult to change the request headers based on the data,
but otherwise I can hardly think of any manipulation that couldn't be
done by the stock HTTP proxy.

> 3. I've already managed to include some code in http.c so that a
> python-function is
>     called just before headers are modified through Zorp. Based on the
> result of the 
>     external program I know how to change the headers (not in the
> config-function 
>     like normally where I have no information about the request). The
> link 
>     between the external program and the python function is done by
> asking for an 
>     ID in a database which returns the changes to be made.
> This also makes it possible to filter headers which are unknown before
> the request
> arrives. It's not really a performance solution but the normal rules
> are a little bit
> too static for me and security is considered higher as performance for
> me...

There are too many options, let me see what you think about the contents
of this email, and then I can probably recomment you a good/better
solution.

> 
> I surely could do that by running some kind of tcpdump but this
> doesn't work with 
> SSL and it's difficult to synchronize (Zorp could have already sent
> before the changes
> are calculated!).

> 
-- 
Bazsi