Apr 3, 2011

tweaking HTTP header to imporve performance

Lately, I am doing some client side optimization. One of the techniques is cache. When browser firstly request a resource and get the response, the cache subsystem of the browser need to determine whether to cache resource. It is based on the response header Cache-Control, and Expires. When neither of them exists in the response header, browsers use their own strategy to cache, these strategies vary from browser to browser, so they are not predictable, we should explicitly set these headers. To explicitly disable cache in browser, we can use the following

Pragma: no-cache 
or
Cache-Control: no-cache 
or
Cache-Control: no-store

The Pragma: no-cache header is included in HTTP 1.1 for backward compatibility with HTTP 1.0+. It is technically valid and defined only for HTTP requests; however, it is widely used as an extension header for both HTTP 1.0 and 1.1 requests and responses. HTTP 1.1 applications should use Cache-Control: no-cache, except when dealing with HTTP 1.0 applications, which understand only Pragma: no-cache. According RFC 2616, Cache-Control: no-cahce is semantically different from Cache-Control: no-store. non-cache still allows browser cache a response, and the cache needs to re-validate the response with the origin server before serving it. no-store request browser not to cache response at all. However most browser treat "no-cache" as "no-store".
To implement the semantics of "no-cache", we should use Cache-Control: max-age=0 or Cache-Control: s-maxage=0 or Cache-Control: must-revalidate.

To explicitly to cache resource, we should use the following.

Cache-Control: max-age=3600
Cache-Control: s-maxage=3600
//or
Expires: Fri, 05 Jul 2002, 05:00:00 GMT

The Cache-Control: max-age header indicates the number of seconds since it came from the server for which a document can be considered fresh. There is also an s-maxage header (note the absence of a hyphen in "maxage") that acts like max-age but applies only to shared (public) caches. The deprecated Expires header specifies an actual expiration date instead of a time in seconds. The HTTP designers later decided that, because many servers have unsynchronized or incorrect clocks, it would be better to represent expiration in elapsed seconds, rather than absolute time. An analogous freshness lifetime can be calculated by computing the number of seconds difference between the expires value and the date value.

If browser hit the same url again, browser will firstly ask the cache sub-system to get a cache copy. If the cache is still fresh based on "Cache-Control" or "Expires" header, the cache is returned, and no request is sent to server. If the cache expired, determined whether a validator was send from the previous response, different request will be sent. If a validator of previous response is "Last-Modified", the following will apply.

//validator of previous response
Last-Modified: Sun, 03 Apr 2011 14:34:43 GMT

//validator of next request
If-Modified-Since: Sun, 03 Apr 2011 14:34:43 GMT

If a validator of previous response is "Etag", the following will apply

//validator of previous response
ETag: 100

//validator of next request
If-None-Match: 100

Normally we should use either "Last-Modified" or "ETag" in response from server. If neither of them is available in the previous response, then a new request will make. You should note that validator is independent with "Cache-Control" and "Expires" header, which means it can be used without these two header.

In asp.net, if you don't do anything, the following default header will be generated.

HTTP/1.1 200 OK
Server: ASP.NET Development Server/10.0.0.0
Date: Sun, 03 Apr 2011 14:47:30 GMT
X-AspNet-Version: 4.0.30319
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 509
Connection: Close

Because the Cache-Control header does not specify a max-age, by default is zero, and because there is no validator available, is semantically "Cache-Control: no-store". So we don't need to do anything if we want the client not to cache the response, because it is by default. The following is the default header of next response.

GET http://localhost:1491/WebSite1/Cache.aspx HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: localhost:1491
Cookie: ASP.NET_SessionId=imdo5jrzxkpw2ohy24guaw0b

The following is a typical header to enable client side cache.

Cache-Control: max-age=3600
Last-Modified: Sun, 03 Apr 2011 14:34:43 GMT

or 
Cache-Control: max-age=3600
ETag: 100

We should prefer "Cache-Control" over "Expires" because it is smarter. When the resource is static, we should config web server to add a Last-Modified header but no using ETag, because it simply and easier. And ETag is problematic in IIS web farm. When the resource is dynamic, we should use dynamically generate an ETag as hash to compare change. Here is some server side code.

private void AddCacheControlHeader(int seconds)
    {
        Response.AddHeader("Cache-Control", string.Format("max-age={0}", seconds));
    }

    private void AddETagValidator()
    {
        string hash = "xyz"; //calcuate a hash match the request
        Response.AddHeader("ETag", hash);
    }

    private void ProgessETagValidator()
    {
        string hash = "xyz"; //calcuate a hash match the request
        if (Request.Headers["If-None-Match"] == hash)
        {
            Response.StatusCode = 304;
            Response.End();
        }
    }