This is an example of how SimpleCrawler can be used to find documents of a specific type on a website. In this case the site in the command line argument is crawled and URIs for documents of the type “application/pdf” is returned.
require 'simplecrawler' # Set up a new crawler sc = SimpleCrawler::Crawler.new(ARGV[0]) sc.maxcount = 200 #Only crawl 200 pages sc.crawl { |document| if document.headers["content-type"] == "application/pdf" puts document.uri end }