Abstract: Data science teams differ from traditional software groups as the former often simultaneously tackle multiple short-lived projects. It’s not uncommon for a small group to have a few days to construct a dashboard. We have to balance the business need with long term reproducibility requirements. This workshop takes you through strategies that you can adopt.
Spoiler alert! There isn’t a perfect solution. Instead, there are multiple options, each with pros and cons. This training will help you decide what is best for your organisation and teams.
Session 1: Redlines
What rules should you have across repositories? For example, should every repository have a README or CI file? If a repository must have a README, then what is the minimum standard? We’ll discuss strategies for using templating or auto-generating files.
Session 2: Merging, what could possibly go wrong?
Merging multiple branches with Git is a joy to behold - when it works. However, combining branches doesn’t always work out so nicely. We’ll discuss merging strategies, such as “to rebase or not to rebase,” as well as the potential pitfalls and benefits of adopting these strategies.
Session 3: Getting the most out of Git with CI
Continuous integration is fantastic. This last session discusses all the amazing ways you can leverage CI to optimise your workflow. From the standard CI use case of package checking to more exotic varieties, such as deployment, auto-tagging and linting your commit messages.
Some basic familiarity with Git. For example, merging, commit, pushing.
Bio: Dr Colin Gillespie is the Co-Founder and CTO of Jumping Rivers. A data science consultancy that specialises in all things R and Python. He is also a Senior Statistics lecturer at Newcastle University, has published over eighty peer-reviewed papers, and co-authored the O'Reilly book, Efficient R programming.