Version 4 (modified by christian, 21 months ago) (diff)

--

  • Title: How to select the right HTTP proxy design
    • Author(s): Christian
    • Date: 9 Sept. 2011
    • Version(s): all
    • Zentyal profiles: Gateway

How to select the right HTTP proxy design?

Part one: concept

When it comes for web browser on the LAN (here after “Intranet”) to access web servers, this can be done either directly or via an intermediate component called HTTP proxy (here after “proxy”).

Thanks to features implemented in proxy server, it can have multiple purposes, the main ones being cache, access control and filtering.

Proxy can be deployed either on Intranet for internal servers or between Intranet and Internet. This is the main usage of Zentyal Proxy component.

Proxy can be configured either in transparent or non transparent mode. This document aims at explaining difference between these 2 designs so that you can make the right choice, understand pros and cons of each.

Before starting such discussion, let's clarify some points: Proxy, because this is one more component in the middle between client and server, will not improve performance until, if cache is used, there is a significant number of users benefiting from cache. Latency will not be shorter except for pages in cache but there is more and more “PRAGMA NO CACHE” tags :-(

Let's assume proxy is deployed on Zentyal server with one connection inside (Intranet) and one connection outside (Internet) as described in the “Perfect Zentyal Gateway setup” document.

Transparent proxy mode:

This mode permits to intercept [1], thanks to firewall[2], all requests sent to internet to proxy listening port (default in Zentyal being 3128).

Pros:

  • no need to define anything on client machines at browser level.
  • Users may ignore there is a proxy in the middle

Cons:

  • HTTPS flow can NOT be handled by transparent proxy. This requires to add firewall rules to permit direct access from browser to HTTPS server.
  • As a result, filtering defined at proxy level doesn't apply and must be managed, by IP address, at firewall level.
  • Transparent proxy MUST be deployed at subnet default gateway otherwise clients will never reach it.
  • As this is transparent, no authentication and therefore profiling based on name or group membership can apply. This also means no access control.

Non transparent proxy mode:

In this mode, browser “knows” there is a proxy to be used. Different mechanisms can be involved to provide this information that will be explained later.

Pros:

  • Proxy can be deployed anywhere on Intranet, no need to match default gateway IP.
  • Authentication and therefore access control and profiling can be enable.
  • HTTPS is handled by proxy. No need for extra firewall rules.Content filtering doesn't work because of encrypted session between client and server (TLS). Domain filtering works. No need for extra firewall filtering rules.

Cons:

  • Browser configuration: if browser is not configured to use proxy, it doesn't work.
  • Users are aware that proxy is used (and therefore control and logs can be enabled)

On large environments, maintaining configuration on each and every machine (client) can be painul and time-consuming. This is the reason why:

  • DNS exist to avoid local /etc/hosts file management
  • DHCP exists to avoid configuring IP address on each device.

Same, some mechanisms exist to help proxy configuration on browsers.

If we look at Firefox (IE provides very similar settings ;-) ), we have 5 different options[3]:

  1. No proxy
  2. Manual proxy configuration
  3. Use system proxy settings
  4. Automatic proxy configuration URL
  5. Auto-detect proxy settings for this network.
  1. No proxy
    Not very interesting here as goal is to use proxy :-)
  1. Manual proxy settings
    This is the potentially painful approach. It has to be done on each and every machine. What you put there is IP address and port number. If any changes, you have to update it everywhere :-( No admin wants to do that!
  1. Use system proxy setting
    Default setting (if I'm not wrong) for Firefox. Useful if proxy is already defined at system level.
  1. Automatic proxy configuration URL
    Default setting (if I'm not wrong) for IE. This one is definitely better than manual configuration because it can be less prone to change even if proxy IP address or port changes. This URL will provide access to a special file: proxy.pac describing browser behavior based on rules stored in this file.
  1. Auto-detect proxy settings for this network (known as WPAD: Web Proxy Auto Discovery Have a look at  http://www.wrec.org/Drafts/draft-cooper-webi-wpad-00.txt )

This is an extension of previous mechanism but URL is even not stored at browser level but provided by (in this order):

  • DHCP: option 252
  • DNS: multiple mechanisms can be used here.
    • SLP (Service Location Protocol)
    • Well known aliases (browser will search for DNS entry describing “wpad.domain”. This requires machine to be known as machine.domain otherwise domain is unknown).
    • Service: URLs (DNS TXT record)

DHCP and DNS “Well known aliases” are the only two mandatory mechanisms for web client as described in draft RFC.

WPAD is very flexible and powerful but has some constraints (also shared with proxy configuration URL): proxy.pac file (or wpad.dat) has to be written and stored on web server. That's it for the concept part.

Part two: implementation

We will look here at “auto detect proxy” mechanism, i.e. WPAD.

First step is to set up webserver for wpad.domain.com. This can be done with Zentyal web server module → Virtual host → wpad.domain.com This serverr is mandatory to handle your wpad.dat file.

Then decide about method that fits the best for you:

1 - DHCP is the one tried first by client... but Zentyal doesn't permit to easily configure new DHCP options. Still you can do it manually in /usr/share/ebox/stubs or better using hooks  http://trac.zentyal.org/wiki/Documentation/Community/HowTo/CustomizeConfigFiles

2 – DNS, with the “well known aliases” method, is easier because, if your clients FQDN are, thanks to DHCP, inherited from domain name, then browser will search for wpad.(whatever).domain. If you have set up such name in your DNS pointing to web server described above, you're done :-)

3 – Last step: create a wpad.dat file and store it at the root of your wpad.domain web server, that's it.

Generic wpad.pad example:

proxy.pac or wpad.dat example:

function FindProxyForURL(url, host)
{
   if (isInNet(host, "192.168.0.0", "255.255.255.0")) {
      return "DIRECT";
   } else {
      if (shExpMatch(url, "http:*")) 
         return "PROXY zentyal.domain.com:3128" ;
      if (shExpMatch(url, "https:*"))
         return "PROXY zentyal.domain.com:3128" ;
      if (shExpMatch(url, "ftp:*"))
         return "PROXY zentyal.domain.com:3128" ;
      return "DIRECT";
   }
}

Above example says:

  • for anything on subnet 192.168.0.0/24, no proxy
  • for anything else using HTTP, HTTPS and FTP protocol, then go to zentyal.domain.com on port 3128 (this is the proxy on Zentyal)

Would you need to test your PAC file, go there:  http://code.google.com/p/pactester/

'Some hints:' not exposing your wpad.pac file on internet is a good idea ;-) It requires to either run wpad.domain.com server internally or not to bind this server (or virtual host if ran on Zentyal) on external interface. In environments where security is highly critical, not using WPAD is safer because of its “auto-discovery” approach permitting attack especially if Dynamic DNS is enabled. The cost is more manual administration overhead.

[1] assuming Zentyal Intranet address is defined as default gateway for machines on Intranet
[2] firewall is used to intercept requests. If firewall is stopped, no redirection occurs.
[3]  http://support.mozilla.com/en-US/kb/Options%20window%20-%20Advanced%20panel

Attachments