Building a Robust Data Pipeline with the “dAG stack”: dbt, Airflow, and Great Expectations
Building a Robust Data Pipeline with the “dAG stack”: dbt, Airflow, and Great Expectations

Abstract: 

Data quality has become a much discussed topic in the fields of data engineering and data science, and it has become clear that ensuring data quality is absolutely crucial to avoiding a case of ""garbage in - garbage out"". Apache Airflow and dbt (data build tool) are some of the most prominent open source tools in the data engineering ecosystem, and while dbt offers some data testing capabilities, enhancing the pipeline with data validation through Great Expectations can add additional layers of robustness.

This talk will outline a convenient pattern for using these tools in what we've been calling the ""dAG stack"": Build a transformation layer and test those transformations with dbt, validate the source data and add more complex tests as well as data documentation with Great Expectations, and orchestrate the entire pipeline with Airflow. The audience will see some examples of how the tools fit together and complement each other in order to build a robust data pipeline.

Disclaimer: The speaker is an employee of Superconductive, the core maintainers behind Great Expectations. All technologies discussed in the talk (Great Expectations, dbt, Apache Airflow) are non-proprietary open source projects.

Bio: 

Sam Bail is a data professional with a passion for turning high quality data into valuable insights. Sam holds a PhD in Computer Science and has worked for several data-focused startups. In her current role as Engineering Director at Superconductive, she works on “Great Expectations”, an open source Python library for data validation and documentation.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google