Letting It Flow with Spark Streaming

Today, Russ and Michael of the Sharethrough Engineering team are featured over on the Cloudera blog about our usage of CDH and Spark Streaming. We’ve been using Spark Streaming in production for several months now and took the opportunity to share our learnings with those transitioning over from a batch paradigm.

Apache Spark is a fast and general framework for large-scale data processing, with a programming model that supports building applications that would be more complex or less feasible using conventional MapReduce. When we began using Spark Streaming, we shipped quickly with minimal fuss. To get the most out of our new streaming jobs, we quickly adjusted to the Spark programming model. Here are some things we discovered along the way…

Hope it helps! Let us know if you have any questions down in the comments.

UPDATE: Also picked up by the kind folks at Databricks!