Find all PDF documents on a site

This is an example of how SimpleCrawler can be used to find documents of a specific type on a website. In this case the site in the command line argument is crawled and URIs for documents of the type “application/pdf” is returned.

require 'simplecrawler'
# Set up a new crawler
sc =[0])
sc.maxcount = 200 #Only crawl 200 pages
sc.crawl { |document|
   if document.headers["content-type"] == "application/pdf"
      puts document.uri