Sunday, October 21, 2012

Putting XML::Parser back to work

This is my first posting at this blog. Well, at least this is my first post in English since I already have another blog about technology in Brazilian Portuguese.

Whenever I think it makes sense to publish something in English (if the subject is intended for a larger public) I'll do it.

This posting subject is not without intention: it is just an issue that I recently stomped on and after some hard time at Google I finally got something.

XML::Parser is a Perl parser for XML. It is based on the famous Expat and is used by a lot of other XML related modules at CPAN. Although it does not have all the latest features, it is quite reliable.

I was starting a small project of mine and decided to use Xpath for using (by using XML::XPath module for that). Well, XML::XPath does uses XML::Parser internally and that's why I got this issue.

My XML was something like this:

<?xml version='1.0'  encoding='UTF8' ?>

        <COLUMN NAME="STEP">Some Step Name</COLUMN>
        <COLUMN NAME="ERROR_MESSAGE">Some error message</COLUMN>

My first attempt to parse the XML file ended like this:

user@foobar:~/Projects$ ./ new.xml
Couldn't open encmap utf8.enc:
File or directory not found
 at /usr/local/lib/perl/5.14.2/XML/ line 187.

I'm using Ubuntu 12.04, so I started looking at missing libraries (specially the dependencies of Expat). I even removed the DEB package of XML::Parser and reinstalled it from CPAN. Not even a warning about the automated tests execution during that.

After reading the XML::Parser documentation ( I could get where those ".spec" files would be:

user@foobar:~/Projects$ perl -MXML::Parser -e 'foreach (@XML::Parser::Expat::Encoding_Path) { print "$_\n" }'

Of course, utf8.enc files was not there... or it was? XML::Parser supports utf-8 natively, but the XML declaration of the file says "UTF8" and not "utf-8". After changing that in the XML document declaration to "utf-8" the parser worked fine.

A silly problem, isn't it? Well, at least after you understand how the XML::Parser works a little bit better. Let's hope this is useful for somebody else.