Working with UTF-8 in PDF::Writer and Ruby on Rails

Googling for information on how to use PDF::Writer shows that there are many european developers frustrated with the lack of UTF-8 support in PDF::Writer. As Ruby on Rails works great with UTF-8 these days this can be a bit of an issue.

Part of the problem lies in the fact that the PDF specification (at least up to 1.6) does not support UTF-8 (you can use UTF-16 if you like). I had the misfortune of plowing thorugh it a couple of years ago when developing a PDF form filler library for a customer (don’t ask).

In Ruby on Rails, this is easy to solve as long as you only use Latin characters with diacritics. The solution is to switch encoding back to ISO-8859-15 for text strings you feed to PDF::Writer.

A simple extension to the String class will do the trick:

class String
  require 'iconv'
  def to_iso
    c = Iconv.new('ISO-8859-15','UTF-8')
    c.iconv(self)
  end
end

If you are working in Rails you can put this code in the lib folder (I usually call the file string_extensions.rb).

Then, when you call the text method on your PDF::Writer intance you can easily pass a correctly encoded string.

Overriding PDF::Writer text method

A much cleaner approach, as Aníbal describes in the comment below, is to override PDF::Writer’s text method.

Put the following code in a file called pdfwriter_extensions.rb (or whatever you choose to call it) in your lib directory:

CONVERTER = Iconv.new( 'ISO-8859-15//IGNORE//TRANSLIT', 'utf-8')

module PDF
	class Writer
		alias_method :old_text, :text

		def text(textto, options = {})
			old_text(CONVERTER.iconv(textto), options)
		end

	end
end

In your controller that handles the PDF output you add:

  require 'pdf/writer'
  require 'pdfwriter_extensions'

…after which you can use PDF::Writer like in the tutorial:

    pdf = PDF::Writer.new
    pdf.select_font "Helvetica", :encoding => nil 
    pdf.text "User name: <b>#{@user.name}</b>", :font_size => 16, :justification => :left
    send_data pdf.render, :disposition => 'inline', :filename => "user_details.pdf", :type => "application/pdf"

Related Posts:

  • No Related Posts
  • http://www.peterkrantz.com Pete

    Aníbal: That was a much nicer solution if you want to change the behaviour aplication wide. Good work!

  • Johan Lind

    What a coincidence – yesterday i did a very similar override of WWW::Mechanize to get values automatically converted into UTF8. Aliasing methods is a really powerful tool to have in your ruby toolbox.

  • rossnet

    Thanks for the helpful code. I’ve tried to post my comment here, but it failed. Don’t want to write everything again. So you can find my addendum to the code here.

  • slowjack

    Don’t forget:

    alias_method :old_add_text, :add_text

    def add_text(x, y, text, *args)
    old_add_text(x,y,ICONV_CONVERTER.iconv(text), *args)
    end

  • Drew

    Everything is ok with Latin characters, but what should I do with Russian??? PDF::Writer does not understand them :( Do you know any solution???

  • Rel

    thank you for the patch, but for the tables, how to proceed?

  • http://kaishome.de/ Kai
    DQpkZWYgYWRkX3RleHQoeCwgeSwgdGV4dCwgKmFyZ3MpDQogIG9sZF9hZGRfdGV4dCh4LHksSUNPTlZfQ09OVkVSVEVSLmljb252KHRleHQpLCAqYXJncykNCmVuZA0K

    does not work with Time objects. The trick is to call iconv with text.to_s instead of just text. Then you are able to print dates (for example to put Time.now in the page footer).

  • kikito and fjuan

    We love you

  • http://www.jayway.dk Erik L. Underbjerg

    Brilliant! This was just what I needed :-)

  • BuGo

    Am i the only one who gets :

    stack level too deep

    ?

  • Boris

    I also ran into the recursion error “stack level too deep” and resolved it like this:

    alias_method :old_add_text, :add_text unless method_defined?(:old_add_text)

    This prevents the method to be overridden more often than once.

    Cheers,
    boris

  • http://www.ahabman.com Andy

    Are there any ideas or techniques floating around here about how to do this if one is not only using Latin characters with diacritics? I have japanese utf-8 that has to go into PDF::Writer. Maybe I need to learn how to manipulate strings at a lower level or understand unicode more thoroughly, but here’s what I’ve got so far –

    Currently in PDF::Writer:
    Iconv.conv(‘utf-16BE’, ‘utf-8′, ‘ローラー’) ==> 0í0ü0é0ü

    While irb:
    Iconv.conv(‘utf-16BE’, ‘utf-8′, ‘ローラー’) ==> “0\3550\3740\3510\374″

  • gpm

    The PDF::Writer newsgroup brings this up briefly (http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/ff2b849a9fc39a2b). It sounds like you need to convert to UTF-16 and specify a UxFEFF BOM before text string (or, as i understand it, the string must start with the bytes 254 then 255).

    I tried adding this to the solution here (http://pastie.org/252651) but it looks like the pdf is still rendering latin characters with the 254 and 255 bytes being rendered and prepended to everything.

    Please post if you get this working! Another alternative is to move over to prawn, which appears to support UTF-8 out of the box.

  • Pingback: Rails PDF Plugin unter Rails2.1 | Code Schubser

  • Andy

    @gpm

    No solution, I’m re-writing using prawn. The folks around it’s google group are wonderfully helpful, and I look forward to an easier syntax. Cheers

  • http://blogamundo.net/dev Patrick Hall

    “The solution is to switch encoding back to ISO-8859-15 for text strings you feed to PDF::Writer.”

    How is this a solution to working with UTF-8 in PDF::Writer and Rails?

    It’s a way of *not* working with UTF-8 in PDF::Writer and Rails.

  • http://www.donortools.com Ryan Heneise

    Worked for me as well – thanks for posting this solution! I’m subscribing in hopes of seeing something about your experiences with Prawn.

  • Pingback: Exporting from Rails to Excel « Coding is like gardening…

  • Stefan Kroes

    Thanks dude!! I was googling for almost half an hour before I found your article. My accented characters finally look right in my pdf document!

  • http://fernandoguillen.info fguillen

    Just what I needed :).. it works!

  • http://www.clementyne.com Guillaume

    Thank you very much :-)

  • Paul Verschoor

    CONVERTER = Iconv.new( ‘ISO-8859-15//IGNORE//TRANSLIT’, ‘utf-8′)

    module PDF
    class Writer
    alias_method :old_add_text, :add_text

    def add_text(x,y,textto,size,angle,word_space_adjust)
    old_add_text(x,y,CONVERTER.iconv(textto), size,angle,word_space_adjust)
    end

    end
    end

  • Pingback: וילונות מעוצבים|וילונות|וילונות מדהימים|