Monthly Data Statics

Ensuring the accuracy of automated data reports was one of my routine responsibilities. The process involved generating a PDF version of dynamic web reports. Some folks preferred a zip file containing individual PDF documents for each report. Other users sent the reports as one large PDF document. The report application also included a PDF export option and a web service capable of returning all reports in a PDF. Sounds simple enough, most of the heavy lifting is already complete!

That was unfortunately not the case. Most of the reports retrieved previously cached data but some required heavy computation at runtime. Also, the PDF export service was buggy due to the complexity of rendering SWF charts as images and capturing the result as a PDF. It was not uncommon for the service to return partially rendered charts.

Completing this task required a fair bit of manual intervention. Hit a specific URL with date parameters to delete any cached PDF reports for that particular month. Next, Hit a different URL to a web service that would cycle through each report, change SWF object to images, Capture a PDF of the result page and deliver a zip file containing the requested reports. After unzipping the provided file, I would ensure each chart or report was present. If a report was missing or contained unusable charts, I would replace the file by exporting it from the front end.

It was not a difficult or extremely time consuming task but it always popped up right in the middle of a push. I decided to take a stab at automating a portion of the process. A small bash script manually initiated the printing process for each report while recovering from any errors. This approach certainly made things a little easier as it would automatically download any reports that were missing. However there were still unusable returns as a result of the heavy server side printing process. This workflow also required a fair bit of manipulation to combine all individual reports into a single PDF document.

After a few months of enjoying the slightly improved workflow, I started messing around with PhantomJS. A newer version of the charting software was also available which rendered charts as JS as opposed to SWF. Migrating all of the charts to JS was a rather easy task as the chart generation syntax was very similar. PhantomJS + JavaScript charts allowed us to capture a PDF of the report extremely quickly and provide predictable results.  I rolled out the new chart printing engine, dusted off my bash script and started on another round of automation.

The automation process was migrated to PHP. Other folks were going to need this functionality, so I may eventually build a front end. Time limited my ability to produce a fully automated solution.  This round would reliably gather all reports, but still needed to zip or combine the reports into a single PDF.

One fateful Friday afternoon, in between projects, the report gathering process became fully automated. Combining the PDF documents server side required a PHP PDF editing library.  I also added some basic compression support and a GET string to return all reports as a zipped file or as one large PDF.  These last two pieces turned a rather arduous report gathering process into a simple URL visit. Bliss.

This is the current status of my first automation. It was interesting to see this process evolve slightly over time. Though it would have been far more effective to get it right the first time, I learned throughout the process. I still need to make a front end or integrate the user interface into an existing staff portal.

comments powered by Disqus