My 2-days at the R Unconference Hackathon 2016

Two weeks ago, I took part in the R unconference, a 2-day hackathon in San Francisco hosted and organized by rOpenSci. In the unconference, data scientists and developers get together to work on different projects related to R.

It’s impossible to describe everything I learned in those two days, but I’m hopeful I can give you a sense of what I experienced.

This year #runconf16 took place in a funky house in SOMA and the first day started with breakfast and introductions:

Followed by a vote on the projects:

Then contributors got together on projects they were interested in working on.

I had the opportunity to work with Julia Silge and David Robinson on a R package called tidytext: text mining using dplyr, ggplot2 and tidy tools. Although I didn’t participate in writing code, it was an incredible learning experience to be around them and to brainstorm and share ideas.

In the afternoon I learned about Kenneth Benoit’s great package quanteda and was able to apply topic modelling techniques to some of my data (more specifically, LDA). If you’re interested in learning more, start with the quanteda vignette.

At the end of the day Hadley Wickham showed us that his skills aren’t limited to making R packages, teaching, and writing; he also knows about making good cocktails.

On my second day, I participated in a discussion of how best to extract metadata from directory and file names. Henrik Bengtsson brought use cases and ideas on the subject. This new R package is called dirdf and had great contributions by Joe Cheng, Jenny Bryan, Tiffany Timbers, Sean Kross and Melissa Guzman.

Later that morning, I paired with Erin LeDell and Ciera Martinez on a package that provides an R interface to the data.rio API. data.rio is an effort at data transparency and open access built by the city of Rio de Janeiro that makes available different types of data about the sixth largest city in the Americas.

The riodata package opens up everything from metro, train, and bike station data to information about Rio health clinics. The package is in its early stages, but we are working on making more data available and writing better documentation.

At the end of the second day, conference-goers got together to listen to groups present what they worked on.

I was, once more, impressed by how good (and useful) the projects were, and how much we were able to accomplish in only two days. I know that a big part of that productivity comes from working in such a welcoming and supportive environment!

I’m really grateful to the people and organizations that made the hackathon possible. A special thank you to Karthik Ram, Rich FitzJohn, Scott Chamberlain, Jenny Bryan and the rOpenSci community.

Thank you for giving us a chance to meet IRL, for strengthening and increasing the R community, for making the gender balance a reality, and for providing so much learning and sharing!