Abstract: Being data-driven improves decision-making outcomes and enables automation, but building data-driven tooling and culture is a complex and challenging task, especially for startups with limited resources. We will discuss this difficult task of creating an analytics platform from scratch at [you.com](http://you.com/) to protect user privacy while driving decision-making across the organization.
The amount of data created daily is exponentially rising, and harnessing that data effectively and ethically is crucial for success in today’s world. We’ll talk about automatic data collection with privacy constraints and the infrastructure setup for data ingestion (Kafka), persistence (Delta Lake, Azure Databricks Lakehouse), and processing (Spark Batch, Spark SQL, Spark Streaming). We’ll walk through the lessons learned in bringing large volumes of data into a single platform for data analytics. Manual ETL processes that took weeks could now be automated in 10 minutes or less. A deeper understanding of the data was cultivated using a rich data taxonomy borrowed from the medallion architecture (raw, refined, dataset). With this new architecture, You.com built the analytics and experimentation platform to drive adoption and confidence in data-centric decisions making.
Bio: Zairah is a Data Scientist at you.com, the AI search engine, where she leverages her expertise in statistical and machine-learning techniques to build analytics and experimentation platforms. She recently spoke at NeurIPS 2022 and shared her expertise on data-driven decision-making in a privacy-focused AI-first startup. Previously, Zairah was a Data Scientist at IBM Research, researching Natural Language Processing (NLP) and AI Fairness topics. She has published research and holds patents in these domains. Zairah obtained her M.S. in Computer Science from the University of Pennsylvania, where she researched scikit-learn model performance. Her findings have since been used as guidelines for applying machine learning to supervised classification tasks. Zairah has published her work in top AI conferences such AAAI and has over 300 citations. Aside from work, Zairah enjoys adventure sports and poetry.