This is an example of how SimpleCrawler can be used to find broken links on a website (links with HTTP status 404). In the example the site in the command line argument is crawled.
require 'simplecrawler' # Mute log messages module SimpleCrawler class Crawler def log(message) end end end # Set up a new crawler sc = SimpleCrawler::Crawler.new(ARGV[0]) # Crawl first 100 links sc.maxcount = 100 sc.crawl { |document| if document.http_status[0] != "200" then puts "#{document.http_status[0]}: " + document.uri.to_s else puts "Ok : " + document.uri.to_s end }
Save this code in a file called find_broken_links.rb and run it with:
ruby find_broken_links.rb http://www.example.com/