The HTTP(S) proxy architecture

Since the release 5.0 of the UTM, a lighter, but more powerful architecture for the HTTP proxy has been implemented and deployed.

The previous HTTP proxy architecture was based on the so called proxy chaining, that is, whenever a client requested a remote resource, that had not been cached before, a 5 step process took place:

  1. The HTTP proxy -squid- sent a GET request to the server, receiving an HTML page as answer.

  2. The whole HTML page was sent to the content filtering daemon -dansguardian- and analysed.

  3. Dansguardian then sent the page to the antivirus daemon -havp- and analysed for virus and other malware.

  4. Finally, if no virus or malicious content was found, the whole HTML page was sent back to squid, otherwise an HTML error message (“error page”) would have replaced the original page.

  5. squid saved the HTML page (or the error page) for future requests, and delivered it the client that originally requested the HTML page.

The major drawback -and bottleneck- of this architecture is its resource intensiveness. The whole HTML page, indeed, sequentially moved through the whole chain, step by step with no possibility to speed up the process. The HTML page was received from squid and sent to dansguardian to be analysed for content. At this point, even if the content filter found malicious content, meaning that the page could not be served to the client requesting it, the HTML page continued to go down the chain to the havp, then back to squid. Only at this point squid sent an error page to the original client.

Therefore, it was decided to tackle this problem differently, adopting an entirely new approach that ensures more reliability and is far less resource consuming. The HTTP proxy in now backed up by an ICAP server and, while this might at a first sight represent a more complex architecture, it represents a significant performance improvement.

In a nutshell ICAP is a protocol, defined in RFC 3507, that allows to manipulate the content of a web page and serve it back to the client. While this ability can be exploited in several ways, in UTM it is deployed with c-icap, to provide content filtering analysis and anti-virus scan of remote resources (HTML pages, but also audio, video, and text documents, images).

Thanks to c-icap, there are two areas whose performances were boosted:

  1. From squid to c-icap:

    c-icap receives two parallel request from the HTTP proxy

  2. between cicap and the daemons.

See also

More information about ICAP along with its specifications can be found on the icap forum web page.