In the process of constructing a crawler that finds and checks PDF documents on a website I discovered a lot of sites that don’t return information for HEAD requests. A HEAD request should return the same set of HTTP headers as a normal GET request only without the actual payload.

The typical response seem to be status 500 (internal server error) on a lot of IIS sites. So, now is a good time to check your own sites to see what you get back from a:

curl –head http://www.mysite.com