The broken state of EU legal information on the web

In my pet project eurlex.nu I find a lot of weird stuff when scraping documents from the official website eur-lex.europa.eu. The most recent specimen – Final adoption of amending budget No 4 of the European Union for the financial year 2008 – has the publish date 80/80/2200. That’s almost two hundred years into the future with an invalid day/month combo on top. This leads me to believe that the system is in such a broken state that even simple date validation isn’t implemented.

Someone delivered a really poor software project for our tax money. I would love to redo the european legal information website with proper standards (e.g. validating HTML, RDF and proper semantics).

Oh well…

Does your webserver give HEAD?

In the process of constructing a crawler that finds and checks PDF documents on a website I discovered a lot of sites that don’t return information for HEAD requests. A HEAD request should return the same set of HTTP headers as a normal GET request only without the actual payload.

The typical response seem to be status 500 (internal server error) on a lot of IIS sites. So, now is a good time to check your own sites to see what you get back from a:

curl --head http://www.mysite.com

New release of the Ruby Accessibility Analysis Kit and online interface

The current version has some minor bug fixes that will speed up testing. The online test interface has been updated to support direct input of markup. This is for those of you unable to install Raakt locally.

This means that there is no reason to skip basic accessibility testing of whatever you are developing! To find out more on how you can integrate Raakt in your testing framework check out the Raakt wiki which now has a lot more information.

A new version of the Ruby Accessibility Analysis Kit

This is to announce that RAAKT (The Ruby Accessibility Analysis Kit) has been updated. This release includes more accessibility tests and an initial mapping of tests to the Unified Web Evaluation Methodology (UWEM). Also, thanks to Derek Perrault RAAKT now uses Hpricot to parse the HTML document. This solves the problem where the previous parser (RubyfulSoup) declared a class “Tag” that was likely to clash with your local classes in Rails.

To install the new version simply type gem update raakt or gem install raakt if you have a previous version installed.

Changelog

Summary of changes from version 0.4 to version 0.5.1.

  • Example of how to use RAAKT in Watir unit tests.
  • Tests for area element alt attribute.
  • UWEM mapped in comments for relevant test methods.
  • Test to check that input fields of type image have an alt attribute with text.
  • Refactoring of some methods for more compact syntax. Patch by Derek Perrault.
  • Added test to verify that fieldsets have legends.
  • Fixed alt_to_text that needed to check element type before attempting to read attribute value.
  • Fixed language attribute check (downcased value). Added iso language code list.
  • Applied patch from Derek Perrault (better use of Hpricot features).
  • Fixed check for lang attribute (now requires a value as well).
  • Test for charset mismatch in http headers and document meta element.
  • Switch to Hpricot. Patch by Derek Perrault.

An article on the value of, and how to integrate basic accessibility tests in your development process is in the works for standards-schmandards.com. In the meantime check out the Raakt wiki.

If you are using Watir it is very simple:

require 'watir'
require 'raakt'
require 'test/unit'

class TC_myTest < Test::Unit::TestCase
	attr_accessor :ie

	def setup
		@ie = Watir::IE.start("http://www.peterkrantz.com")
	end

	def test_startPagePassesBasicAccessibilityCheck
		#set up the accessibility test and pass html to raakt
		raakttest = Raakt::Test.new(@ie.document.body.parentelement.outerhtml)

		#run all tests on the current page
		result = raakttest.all

		#make sure raakt didn't return any error messages
		assert(result.length == 0, result)
	end
end

Parsing ASP.NET sites with WWW::Mechanize and Hpricot

Users of Hpricot (which WWW::Mechanize is using as the default html parser) may have discovered that the buffer size for attribute values is set to 16384 bytes default. Typically this isn’t a problem, I mean who would put 16Kb of data into an HTML attribute? Well, ASP.NET uses a hidden input field to store view state in order to save a few clock cycles on the server side (and spare developers the hazzle of coding view state).

Typically, developers tend to forget to turn off view state resulting in a lot of data that never is used. The guy who made the decision to have this default view state behaviour has probably caused a lot of unnecessary bytes clogging your internet connection (as it typically is included in each request).

If you are using mechanize and/or Hpricot to parse such a site you may have come across this error:

ran out of buffer space on element <input>, starting on line 38. (Hpricot::ParseError)

If you want to try it out, load this sample viewstate file into Hpricot. The buffer space error has been reported in the Hpricot issue tracker.

Fortunately, from version 0.5 of Hpricot it is easy to increase the buffer size before loading data. This is done by setting the buffer_size attribute to a sufficiently large number:

[source:ruby]
require ‘hpricot’
Hpricot.buffer_size = 262144
[/source]

Fixing Mechanize

As mechanize uses Hpricot as the default parser this error will happen when loading many ASP.NET pages. Fortunately, mechanize allows the user to specify a custom parser class through the pluggable_parser attribute. To make mechanize use Hpricot with a larger buffer size:

[source:ruby]
require ‘hpricot’
require ‘mechanize’

Hpricot.buffer_size = 262144
agent = WWW::Mechanize.new
agent.pluggable_parser.default = Hpricot
agent.get(‘http://www.peterkrantz.com/wp-content/uploads/2007/02/viewstatesample.htm’)
[/source]

…and we’re back on track mechanizing the world again.

Using Selenium for functional testing in Ruby on Rails

Update: There is now a nice demo of how selenium on rails works.

Jonas Bengtsson has created an initial version of a Selenium plugin for RoR.

I have been using Selenium for a while now and this certainly looks promising. There are some minor details in this release that need to be fixed such as coloring of completed test actions and test cases (mine are not highlighted). A nice addition would be if RadRails supported code completion of selenium actions.