Improving Django performance with better caching

The Django cache middleware is great, but has one drawback. If you are caching views (which can give a nice performance boost) Django will only use the path segment of the URL to create a cache key. If you are an avid reader of RFC 3986 you may remember that a URI consists of multiple components; path and query being of special interest here. The problem is documented in ticket 4992 (Update: it is not in Django).

Given the following URL:

http://example.com/items/?order_by=name

…Django will ignore the query part when determining a cache key so the key for the above request will be:

/items/

Any type of query parameter will make Django ignore the cached page. Django will not create a new cache item for the order_by request. This will be detrimental for the performance of your site. I had expected Django to create three different cache items for these URLs instead of one cache item and two ignored requests:

  • http://example.com/items/
  • http://example.com/items/?order_by=name
  • http://example.com/items/?order_by=date

The solution

Fortunately Django’s source is very readable and easy to adapt to your own needs. A few minor changes to middleware/cache.py and utils/cache.py and you are ready to go. For details see the patch I attached to ticket 4992.

I recently deployed a Django site to get an overview of VPS hosting plans and the ability to cache items based on the filtering parameters gives much better performance.

Related Posts:

  • Joe

    The answer for me is — don’t cache views.
    Cache the data returned from queries, before sorting/processing/etc.

    – joe

  • http://friendlybit.com Emil Stenström

    So you’re already hacking away on the Django source :) That’s clearly one success criteria of a open source project, to get smart people interested.

    In your specific case, you could have made your views not use get parameters: /items/sort/name/ or something like that, but I agree with you that the expected behavior is that GET params are cached too. When using the middleware sitewide cache I’m not sure we want the same behavior though, that would cache all searches (?q=ponies) too.

    Nice VPS site, will keep in mind when the play-around-hosting I use now expires.

  • http://friendlybit.com Emil Stenström

    Also, OpenID is very much broken here. I logged in and got redirected to your wp-admin with the message that a user was created but that I couldn’t log in. Message was lost :(

    Second time my name was saved in the OpenID field (how?) and my comment got lost again.

    Very annoying…

  • http://www.peterkrantz.com Peter Krantz

    @Emil: It gets complicated to do slash-based URL:s if you have many combinations of parameters. Also, for a site where content changes slowly, I would expect to receive the same result for ?q=ponies over a short period of time (e.g. an hour) so I would definitely like that to get cached.

    Thanks for notifying me about the openid-stuff. Seems to be thoroughly broken right now.

  • http://friendlybit.com Emil Stenström

    Yeah, I agree, with many parameters it gets complicated, and your patch works well for that case.

    For a typical site, most search queries will probably be new ones, so most people won’t benefit from caching them. But I guess it doesn’t hurt to cache them too.

  • http://tomsk.fm Evgeniy

    This patch applied to the trunk? Or GET requests still not cached?