Sunday, February 22, 2009

Perl rocks, again.

As much as people have been knocking it lately, I have to say it's still a great tool to have around. I'm working on a project where we have to examine data from an external source in order to generate reasonable metrics/reports. Part of this is knowing the quality of the data. Well, I tried several tools to get this data and there were two major problems:

1) Slow. It took a while to get the metrics and typically sucked up a huge chunk of memory doing so.

2) I couldn't save the results, which meant that I had to rerun the analysis every time I wanted to see it, which meant I had a large chunk of memory being used for as long as the program was open.

Granted, this is not to say these programs didn't do a good job. They really did, at least with the analysis. Those two points are just necessary for me. If I didn't get the info in a reasonable amount of time, it just wouldn't be useful for me.

Perl to the rescue. Well, mostly. I had a script that I created before for analyzing a CSV file and making an educated guess on the data types for each column. Well, I just took that and revamped it a bit as there were some more features I wanted to add. With a combination of language constructs that make things easy to program and the wealth of modules that are available in the Perl core and CPAN, I managed to get a very nice script up and running that would analyze the CSV file, generate the info that I needed, and save the reports as HTML files.

Now, I just have to export the data as CSV, which I prefer as it makes no assumptions about types, run it through the script, and I'm good to go. Not surprisingly, this is significantly faster than running a tool that does it directly from the database. Many of the tables have over 100K rows, one had over 500K, so having to read all that data into memory and analyzing the data takes up a good bit of memory. For the analysis of one of the larger data sets, my Perl script took over 600MB of memory.

Yes, I do realize that this isn't the most efficient code for this, however I needed it sooner than later and it runs fast enough now. What used to bog down my computer for some time now may take a minute or two.

Labels: , ,


Post a Comment

Links to this post:

Create a Link

<< Home