Users of Hpricot (which WWW::Mechanize is using as the default html parser) may have discovered that the buffer size for attribute values is set to 16384 bytes default. Typically this isn’t a problem, I mean who would put 16Kb of data into an HTML attribute? Well, ASP.NET uses a hidden input field to store view state in order to save a few clock cycles on the server side (and spare developers the hazzle of coding view state).
Typically, developers tend to forget to turn off view state resulting in a lot of data that never is used. The guy who made the decision to have this default view state behaviour has probably caused a lot of unnecessary bytes clogging your internet connection (as it typically is included in each request).
If you are using mechanize and/or Hpricot to parse such a site you may have come across this error:
ran out of buffer space on element , starting on line 38. (Hpricot::ParseError)
Fortunately, from version 0.5 of Hpricot it is easy to increase the buffer size before loading data. This is done by setting the buffer_size attribute to a sufficiently large number:
[source:ruby] require ‘hpricot’ Hpricot.buffer_size = 262144 [/source]
As mechanize uses Hpricot as the default parser this error will happen when loading many ASP.NET pages. Fortunately, mechanize allows the user to specify a custom parser class through the pluggable_parser attribute. To make mechanize use Hpricot with a larger buffer size:
[source:ruby] require ‘hpricot’ require ‘mechanize’
Hpricot.buffer_size = 262144 agent = WWW::Mechanize.new agent.pluggable_parser.default = Hpricot agent.get(‘http://www.peterkrantz.com/wp-content/uploads/2007/02/viewstatesample.htm') [/source]
…and we’re back on track mechanizing the world again.