Abstract: The data science team at Zymergen is applying machine learning techniques to identify genetic targets, work that is supported by extensive analytical automation that systematically identifies outliers, removes process-related bias, and quantifies performance improvements. We’re using Apache Airflow to construct robust data pipelines that allow us to produce clean, reliable inputs to our predictive models. In this talk, I’ll discuss the unique data processing challenges we face in working with high-throughput, biological data and provide an overview of how we’re using Apache Airflow to meet those challenges.
Bio: Erin is a data scientist with experience in a broad range of industries including retail, cloud computing, and biotechnology. She loves to tackle complex problems with her quantitative and computational skill set, and along the way she has built recommendation engines, web scrapers, interactive visualizations, and analyzed terabytes of data. Erin enjoys sharing what she's learned in her work and does so often through speaking engagements and as an instructor at the University of Washington's Professional and Continuing Education program.