Elasticity 2.6 Released

Over the weekend we released the latest version of Elasticity, the gem we use to control our Amazon Elastic MapReduce flows. Two big features and a handful of incremental internal updates went in from 2.5 to 2.6 including job flow completion polling and debug support.

Completion Polling

One of the challenges in providing a gem like Elasticity is knowing where to draw the lines. So much goes into a data pipeline that if you’re not careful, you’ll end up creating your own Hadoop distribution.

Let’s take S3 asset upload as an example. Sure, we could say “Use Fog, right_aws or the ever-improving AWS SDK for Ruby” and people would be fine with that. However, asset upload is something so necessary for putting together an EMR workflow that Elasticity didn’t feel complete without offering at least a basic amount of functionality there, so we did.

There are many data workflow management tools like Azkaban (which we use) and Oozie. That being said, a modicum of job flow state awareness and polling also felt necessary, as measured by the fact that we rolled our own at Sharethrough and that our users are both asking for and writing it themselves.

Completion monitoring comes in two flavours, both via JobFlow#wait_for_completion: with and without providing a status callback.

1
2
# Blocks until status changes
jobflow.wait_for_completion
1
2
3
4
# Yields every 60 seconds until the status changes
jobflow.wait_for_completion do |elapsed_time, job_flow_status|
  puts "Waiting for #{elapsed_time}, jobflow status: #{job_flow_status.state}"
end

Debug Support

If you’re unfamiliar with EMR debugging, there are two relevant portions of the Developer Guide to check out:

Enabling debug support in Elasticity is straightforward with only two requirements (enforced by the EMR API):

1
2
jobflow.log_uri = 's3n://examplebucket/logs'
jobflow.enable_debugging = true

The Little Things

Several small and important changes happened behind the scenes which you can get more information on in the release history. Notably, Elasticity will no longer officially support Ruby 1.8.7 nor REE as 1.8.7 is EOL’d as of July 2013. For Elasticity developers, RSpec 2.14 has some nice changes, including spies which should be fun to play with :)

Happy EMR'ing!