I’ve given a talk about how to take Data Science projects to production thanks to the help of Software Developers.

I work in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R…). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed.

With my fellow Developers, our goal is to bring these processes to production. Our constraints are very different: we want the code to be versioned, to be tested, to be deployed automatically and to produce logs. We also need it to run in production on distributed architectures (Spark, Hadoop), with fixed versions of languages and frameworks (Scala…), and with data that changes every day.

In this talk, I will explain how we, Developers, work hand-in-hand with Data Scientists to shorten the path to running data workflows in production.

Here are the slides:


Also, the talk has been recorded at the Big Data Meetup in Washington DC:


Vous souhaitez en savoir plus sur le sujet ? Invitez-moi pour un Brown Bag Lunch !