Abstract: project44 tracks the shipment of cargo on planes, trains, boats, trucks across the globe. Our Data Science team is tasked with creating models within the platform that provide the ability for our customers to see an estimated arrival time of those shipments, wether it is arriving via plane, train, boat, truck or any combination thereof.
This Estimated Time of Arrival prediction problem requires the careful and thoughtful selection of performance metrics. How do we map real-world use-cases to statistical performance metrics? In this talk we’ll walk through how our thinking has evolved over time on the “right way” to measure Estimated Time of Arrival for Freight Shipments, all in the middle of a supply chain crisis!
Throughout this talk you’ll hear a discussion of the PROS and CONS of selecting various commonly used accuracy metrics for regression (Mean Absolute Error, Mean Relative Absolute Error, Median Absolute Error) as well as some less commonly used metrics.
In addition to talking about the metrics themselves, we’ll also talk about the PROS and CONS of various techniques to sample our production data to make sure our metrics are providing an appropriate story, one that matches our customers’ experiences.
This talk will take an example-driven approach to show the impact of these metrics decisions (which often seem abstract and academic) on real-world Estimated Time of Arrival predictions.
While not everyone in the audience has a task to predict the time a cargo container will arrive at the Port of Oakland, the lessons learned on how to select performance metrics, and how to test their implications are applicable widely across Data Science.
Bio: Matt is Director of Data Science with over a decade of experience solving complex business problems with data, modeling and simulation. Over the past year in his tenure at project44, Matt has been scaling the data science team from a few disparate efforts to a full department of 30 team members around the globe. The data science team at project44 uses the billions of shipments that are tracked through project44’s platform to extract insights that help customers made data-driven decisions: everything from “estimated time of delivery” to “impact of the latest disruptions”. Project44’s data science team uses state-of-the-art Machine Learning techniques to capture the dynamic trends and patterns of today’s supply chain. Despite the pandemic and global nature of Matt’s team – the data scientists at project44 routinely hold “virtual whiteboarding sessions” where they brainstorm, trade ideas about statistical techniques, and also discuss their latest Netflix favorites.