Editor’s note: Dr. Jeff Saltz is a speaker for ODSC East 2022. Be sure to check out his talk, “Managing Data Science Projects via an Agile Framework,” there to learn more about data science project management.

Data science managers (and senior leaders managing data science teams) need to think through many questions relating to how to best execute their data science efforts. For example, how should the team brainstorm ideas, how should the team prioritize those potential ideas, and more generally, how to help ensure the team delivers actionable insights. 

While these challenges are very different then the technical machine learning challenges that most teams focus on trying to solve, these management challenges are equally important in helping to ensure a successful data science project. In other words, teams need to think about not only which specific algorithm a team should use, but also think about the process they use to effectively and efficiently work on a data science project. 


How using an effective data science project management/process framework can help

In short, while there are many reasons why data science projects fail, many are not “technical” in nature. That is why having an effective data science process can minimize many potential issues. To understand how a team process can increase the value of data science projects by focusing on their team process, below are 5 typical reasons why a project will not deliver on its potential:

  1. Not having the right data
  2. Solving the wrong problem
  3. Not having a model that can be operationalized
  4. Having a model that doesn’t deliver value
  5. Forgetting about potential bias and fairness

 Having an effective data science team process can minimize these issues. For example, solving the wrong problem is often related to stakeholder engagement and communication within the data science team. Hence, a collaboration and communication framework would minimize the risk of solving the wrong problem. Similarly, operationalization challenges can also be reduced via improved communication across the team, in terms of discussing how a model would be used during the project, not just at the end of the project.

Remember that Data Science is not Software Engineering

If data science teams need to use a team process, people often think that the best path forward is to use a software development process framework. In other words, data science teams (or team leaders) might recognize the need for a better process, and think, “let’s just use what works for software development”. 

On the surface, data science and software may seem similar. Both fields produce code. Both are high-tech fields led by skilled professionals. Both are built on top of computer science and mathematics. But, there are key differences in data science projects as compared to software development projects. For example, data science projects are often much more exploratory in nature. Feel free to explore the differences in more depth. Taking this into account, data science teams should use a project management / process framework that is best suited for a data science project (not something that works well for software development projects). 

Do we need to use Scrum?

Some data science teams do use Scrum, which is the most popular agile framework for software development projects. However, many of these teams find that using Scrum is challenging to use in a data science context. For example, one key challenge for data science teams that use Scrum is that, in Scrum, iterations (known as sprints in Scrum) are always the same length. But it is often difficult for data science teams to know what can “fit in a sprint” (i.e., adhering to Scrum’s fixed time-boxed sprints can be problematic). Also, sometimes it helps to learn from an iteration that might be shorter (or longer) than the defined sprint length. 

So, with this in mind, data science teams (and data science leaders) should establish an agile data science project process framework, but think more broadly than just assuming they should use Scrum. For example, Data Driven Scrum enables many of the benefits of agility but was defined within a data science context (as opposed to a software development context).

 Don’t forget about Data Science Life Cycles

As part of defining a team process, the team should also define a data science life cycle, which are the steps required to do a data science project (a life cycle is sometimes called workflow). A team’s life cycle typically includes steps such as obtaining data, cleaning the data, and then creating a machine learning model. 

A data science life cycle is useful, as it helps to make sure the team has a shared mental model (and common vocabulary) of the work required in a data science project. CRISP-DM is the most commonly used framework for defining a data science life cycle. Get an overview of CRISP-DM to understand its strengths and weaknesses. 

Taking the next step

Wrapping it all up, when working on data science projects, it is important to think through how the team will work together and this team process should include (1) how the team will communicate and collaborate via an agile framework (2) a life cycle framework, and (3) how these two frameworks are integrated.

 For more information on data science project management, browse my blog posts on www.datascience-pm.com. 

About the author/ODSC East 2022 Speaker:

Dr. Jeff Saltz spends too much of his time exploring how to effectively manage and coordinate data science projects within and across teams. In addition to being a professor at Syracuse University, Jeff has worked with a wide range of organizations to improve their data science process, and via his research and consulting, has published 30+ peer-reviewed academic papers that (1) explore the challenges in executing data science projects, and (2) evaluate potential frameworks via experiments and real-world case studies. To help share his knowledge with practitioners, Jeff helped create the Data Science Process Alliance (DSPA), which combines his extensive data science project management research with industry-leading agile training expertise. DPSA offers courses and a wide range of resources to the thousands of data scientists and data science managers who are part of the DSPA community.