Building Robust Data Pipelines with Airflow
Building Robust Data Pipelines with Airflow


The data science team at Zymergen is applying machine learning techniques to identify genetic targets, work that is supported by extensive analytical automation that systematically identifies outliers, removes process-related bias, and quantifies performance improvements. We’re using Apache Airflow to construct robust data pipelines that allow us to produce clean, reliable inputs to our predictive models. In this talk, I’ll discuss the unique data processing challenges we face in working with high-throughput, biological data and provide an overview of how we’re using Apache Airflow to meet those challenges.


Erin is a data scientist with experience in a broad range of industries including retail, cloud computing, and biotechnology. She loves to tackle complex problems with her quantitative and computational skill set, and along the way she has built recommendation engines, web scrapers, interactive visualizations, and analyzed terabytes of data. Erin enjoys sharing what she's learned in her work and does so often through speaking engagements and as an instructor at the University of Washington's Professional and Continuing Education program.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google