Abstract: Jupyter notebooks are a key tool for many data science teams. They allow for rapid prototyping, development, and sharing results with both technical and non-technical audiences. As a data science team grows, both in terms of individuals and work performed, Jupyter notebooks can become difficult to manage and keep clean. This talk describes several best practices for working with Jupyter notebooks on a data science team. It will cover:
- Writing and organizing code within a notebook for maximum reproducibility
- How to effectively manage with version control, view legible diffs, and perform code reviews
- Ways to implement quality checks via linting, pre-commit hooks, and integration tests
- Quick and simple ways to share content with non-technical audiences
We will showcase many of these best practices with notebooks that the data science team at Saturn Cloud uses every day.
Bio: Aaron Richter is a software developer turned data engineer and data scientist. He has pioneered the development and implementation of large-scale data science infrastructure in both business and research environments. Inevitably, he spent a lot of time finding efficient ways to clean data, run pipelines, and tune models. Aaron is currently a Senior Data Scientist at Saturn Cloud, where he works to make data scientists faster and happier. He holds a PhD in machine learning from Florida Atlantic University.