Spark Summit 2014 Recap

Sharethrough recently participated in Spark Summit 2014. Our own Russell Cardullo spoke on “Spark Streaming for Realtime Auctions.”

Russell offered advice on how we make composable and re-usable Spark code, not to mention some strategies we use for testing our Spark jobs.

We wanted to mention that we also really enjoyed Chris Johnson’s Spark Summit talk on Music Recommendations at Scale Chris’ talk stood out as a reminder of Spark’s power when implementing calculations that fall outside traditional map-reduce aggregations.

As part of their iterative and experimental approach to recommendation technology, Spotify uses a half-gridify method for alternating least squares, with weighted lambda regularization in Spark (the algorithm of Netflix prize fame).

Chris also conveyed how Spark’s MLLib delivers performance on par (within an order of magnitude) with the parallel-computation GraphLab API. The fact that there are ‘internet-scale’ companies contributing to MLLib and beginning to use it in production is fantastic for us. We can’t wait to hear about how other companies use Spark to transform their map-reduce and ML tasks in the future.