Abstract: Implementing data quality can be a daunting task, but that doesn’t make it any less critical. Low-quality data can cause anything from suboptimal business outcomes to regulatory noncompliance. Data quality, though, makes it possible to catch small problems before they can spiral into the big ones.
In this talk, I’ll introduce techniques and principles for getting started with data quality that are applicable across the vast majority of organizations and datasets. I’ll also show you how to implement those using Great Expectations OSS: a Python-based data quality platform that you can use across an extremely broad range of data and tech stacks.
Topics this talk will cover include:
Ways to harness data from different formats, diverse cloud storage repositories, and numerous tables from multiple warehouses.
Utilizing Data Assistants to automatically bootstrap entire suites of data quality validations for your specific dataset.
How to generate documentation about your data to keep your collaborating data engineering and data science teams on the same page.
Bio: Alex Sherstinsky is a staff machine learning and data products engineer on the team developing the core platform of Great Expectations, the leading open source data quality platform. Previously, Alex developed augmented intelligence systems that harness machine learning and gig work models to transform and scale customer service at Directly, Inc. He was a product and technical co-founder at GrowthHackers.com and Qualaroo, and a product/engineering executive at other venture capital-backed startups. Alex earned his Ph.D. in machine learning from MIT, with research conducted at the Media Lab. His scientific publications appear in refereed journals and conference proceedings; he holds 5 U.S. patents.