Users of Hpricot (which WWW::Mechanize is using as the default html parser) may have discovered that the buffer size for attribute values is set to 16384 bytes default. Typically this isn’t a problem, I mean who would put 16Kb of data into an HTML attribute? Well, ASP.NET uses a hidden input field to store view state in order to save a few clock cycles on the server side (and spare developers the hazzle of coding view state).

Typically, developers tend to forget to turn off view state resulting in a lot of data that never is used. The guy who made the decision to have this default view state behaviour has probably caused a lot of unnecessary bytes clogging your internet connection (as it typically is included in each request).

If you are using mechanize and/or Hpricot to parse such a site you may have come across this error:

ran out of buffer space on element , starting on line 38. (Hpricot::ParseError)

If you want to try it out, load this sample viewstate file into Hpricot. The buffer space error has been reported in the Hpricot issue tracker.

Fortunately, from version 0.5 of Hpricot it is easy to increase the buffer size before loading data. This is done by setting the buffer_size attribute to a sufficiently large number:

[source:ruby] require ‘hpricot’ Hpricot.buffer_size = 262144 [/source]

Fixing Mechanize

As mechanize uses Hpricot as the default parser this error will happen when loading many ASP.NET pages. Fortunately, mechanize allows the user to specify a custom parser class through the pluggable_parser attribute. To make mechanize use Hpricot with a larger buffer size:

[source:ruby] require ‘hpricot’ require ‘mechanize’

Hpricot.buffer_size = 262144 agent = WWW::Mechanize.new agent.pluggable_parser.default = Hpricot agent.get(‘http://www.peterkrantz.com/wp-content/uploads/2007/02/viewstatesample.htm') [/source]

…and we’re back on track mechanizing the world again.