The task at hand was analyzing patterns in a bunch of XML files. I had 270Mb of XML files, and the largest file was 32Mb. Because I was pretty sure I was executing the analysis more than once (errare humanum est and I want to improve the analysis in the future) I decided to use SAX to read the files.
Having decided to use SAX I then decided to use Perl to do the job. I'm pretty familiar with it and was able to quickly find a SAX sample. Besides, I had a few string matches and replacements to do, and Perl is a great language for that.
Turns out that using SAX in Perl demands that you use and define objects in Perl. And it turns out that defining objects in Perl is... well, terrible! I really hated the syntax, bless and the way attributes were defined. It all looks like a big hack! To add insult to injury, the perlSAX has a few quirks when changing handlers. This is mandatory to make your SAX code maintainable... So I dropped Perl and went for Python.
To my surprise the transition was really easy. I was able to convert my Perl code to Python very quickly, with only a few doubts now and then on specific stuff. Here's what I gained from the transformation:
- I learned Python (finally!)
- Better SAX handlers (the quirks that happen in Perl don't happen in Python)
- Clearer attribute access (if you have an object with a reference to an array of references and want to print it in Perl... things can get weird)
- Clearer object definition and usage (no bless!)
- Fewer lines of code (from ~250 to ~150)
- Same performance (I was worried about this, but both scripts took the same time to execute!)
But whenever I need to do something a bit more complex that requires complex data types or OO programming, from now on I'll definitely turn to Python!
No comments:
Post a Comment