Working with UTF-8 in PDF::Writer and Ruby on Rails
Googling for information on how to use PDF::Writer shows that there are many european developers frustrated with the lack of UTF-8 support in PDF::Writer. As Ruby on Rails works great with UTF-8 these days this can be a bit of an issue.
Part of the problem lies in the fact that the PDF specification (at least up to 1.6) does not support UTF-8 (you can use UTF-16 if you like). I had the misfortune of plowing thorugh it a couple of years ago when developing a PDF form filler library for a customer (don’t ask).
In Ruby on Rails, this is easy to solve as long as you only use Latin characters with diacritics. The solution is to switch encoding back to ISO-8859-15 for text strings you feed to PDF::Writer.
A simple extension to the String class will do the trick:
If you are working in Rails you can put this code in the lib folder (I usually call the file string_extensions.rb).
Then, when you call the text method on your PDF::Writer intance you can easily pass a correctly encoded string.
Overriding PDF::Writer text method
A much cleaner approach, as Aníbal describes in the comment below, is to override PDF::Writer’s text method.
Put the following code in a file called pdfwriter_extensions.rb (or whatever you choose to call it) in your lib directory:
In your controller that handles the PDF output you add:
…after which you can use PDF::Writer like in the tutorial:







module PDF
class Writer
alias :text_old :text
def text( texto, options = {} )
text_old( CONVERTER.iconv(texto), options )
end
end
end
Sorry, I was missing the iconv converter:
Aníbal: That was a much nicer solution if you want to change the behaviour aplication wide. Good work!
What a coincidence - yesterday i did a very similar override of WWW::Mechanize to get values automatically converted into UTF8. Aliasing methods is a really powerful tool to have in your ruby toolbox.
Thanks for the helpful code. I’ve tried to post my comment here, but it failed. Don’t want to write everything again. So you can find my addendum to the code here.
Don’t forget:
alias_method :old_add_text, :add_text
def add_text(x, y, text, *args)
old_add_text(x,y,ICONV_CONVERTER.iconv(text), *args)
end
Everything is ok with Latin characters, but what should I do with Russian??? PDF::Writer does not understand them
Do you know any solution???
thank you for the patch, but for the tables, how to proceed?
Thank you for this little patch that help me much!
I have a small question: Why is the converter a constant?
I want to add that its better to use the method “add_text” instead of “text” because SimpleTable uses “add_text” and “text” uses “add_text” internally.
does not work with Time objects. The trick is to call iconv with text.to_s instead of just text. Then you are able to print dates (for example to put Time.now in the page footer).
We love you
Brilliant! This was just what I needed
Am i the only one who gets :
stack level too deep
?
I also ran into the recursion error “stack level too deep” and resolved it like this:
alias_method :old_add_text, :add_text unless method_defined?(:old_add_text)
This prevents the method to be overridden more often than once.
Cheers,
boris
Are there any ideas or techniques floating around here about how to do this if one is not only using Latin characters with diacritics? I have japanese utf-8 that has to go into PDF::Writer. Maybe I need to learn how to manipulate strings at a lower level or understand unicode more thoroughly, but here’s what I’ve got so far -
Currently in PDF::Writer:
Iconv.conv(’utf-16BE’, ‘utf-8′, ‘ローラー’) ==> 0í0ü0é0ü
While irb:
Iconv.conv(’utf-16BE’, ‘utf-8′, ‘ローラー’) ==> “0\3550\3740\3510\374″
The PDF::Writer newsgroup brings this up briefly (http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/ff2b849a9fc39a2b). It sounds like you need to convert to UTF-16 and specify a UxFEFF BOM before text string (or, as i understand it, the string must start with the bytes 254 then 255).
I tried adding this to the solution here (http://pastie.org/252651) but it looks like the pdf is still rendering latin characters with the 254 and 255 bytes being rendered and prepended to everything.
Please post if you get this working! Another alternative is to move over to prawn, which appears to support UTF-8 out of the box.
[...] übrigens im PDF Dokument mit UTF-8 arbeiten will, sollte mal bei Peter Krantz [...]