Sunday, October 21, 2012

Putting XML::Parser back to work

This is my first posting at this blog. Well, at least this is my first post in English since I already have another blog about technology in Brazilian Portuguese.

Whenever I think it makes sense to publish something in English (if the subject is intended for a larger public) I'll do it.

This posting subject is not without intention: it is just an issue that I recently stomped on and after some hard time at Google I finally got something.

XML::Parser is a Perl parser for XML. It is based on the famous Expat and is used by a lot of other XML related modules at CPAN. Although it does not have all the latest features, it is quite reliable.

I was starting a small project of mine and decided to use Xpath for using (by using XML::XPath module for that). Well, XML::XPath does uses XML::Parser internally and that's why I got this issue.

My XML was something like this:

<?xml version='1.0'  encoding='UTF8' ?>
<RESULTS>

    <ROW>
        <COLUMN NAME="SALES_CHANNEL">Foobar</COLUMN>
        <COLUMN NAME="ECOMM_ORDER">987456</COLUMN>
        <COLUMN NAME="INSTANCE_ID">123456</COLUMN>
        <COLUMN NAME="STEP">Some Step Name</COLUMN>
        <COLUMN NAME="ERROR_MESSAGE">Some error message</COLUMN>
        <COLUMN NAME="CREATION_DATE">08/10/12</COLUMN>
    </ROW>
</RESULTS>


My first attempt to parse the XML file ended like this:

user@foobar:~/Projects$ ./test.pl new.xml
Couldn't open encmap utf8.enc:
File or directory not found
 at /usr/local/lib/perl/5.14.2/XML/Parser.pm line 187.


I'm using Ubuntu 12.04, so I started looking at missing libraries (specially the dependencies of Expat). I even removed the DEB package of XML::Parser and reinstalled it from CPAN. Not even a warning about the automated tests execution during that.

After reading the XML::Parser documentation (http://search.cpan.org/~msergeant/XML-Parser-2.36/Parser.pm#ENCODINGS) I could get where those ".spec" files would be:

user@foobar:~/Projects$ perl -MXML::Parser -e 'foreach (@XML::Parser::Expat::Encoding_Path) { print "$_\n" }'
/usr/local/lib/perl/5.14.2/XML/Parser/Encodings
.


Of course, utf8.enc files was not there... or it was? XML::Parser supports utf-8 natively, but the XML declaration of the file says "UTF8" and not "utf-8". After changing that in the XML document declaration to "utf-8" the parser worked fine.

A silly problem, isn't it? Well, at least after you understand how the XML::Parser works a little bit better. Let's hope this is useful for somebody else.