Abstract: Data quality has become a much discussed topic in the fields of data engineering and data science, and it has become clear that ensuring data quality is absolutely crucial to avoiding a case of ""garbage in - garbage out"". Apache Airflow and dbt (data build tool) are some of the most prominent open source tools in the data engineering ecosystem, and while dbt offers some data testing capabilities, enhancing the pipeline with data validation through Great Expectations can add additional layers of robustness.
This talk will outline a convenient pattern for using these tools in what we've been calling the ""dAG stack"": Build a transformation layer and test those transformations with dbt, validate the source data and add more complex tests as well as data documentation with Great Expectations, and orchestrate the entire pipeline with Airflow. The audience will see some examples of how the tools fit together and complement each other in order to build a robust data pipeline.
Disclaimer: The speaker is an employee of Superconductive, the core maintainers behind Great Expectations. All technologies discussed in the talk (Great Expectations, dbt, Apache Airflow) are non-proprietary open source projects.
Bio: Sam Bail is a data professional with a passion for turning high quality data into valuable insights. Sam holds a PhD in Computer Science and has worked for several data-focused startups. In her current role as Engineering Director at Superconductive, she works on “Great Expectations”, an open source Python library for data validation and documentation.