Abstract: In this talk, I will take you on a journey building a centralized data repository that ingests from a wide variety of sources - e.g. service databases, SAAS applications, unstructured files, and conversational data. I give real life examples on how migrating from proprietary data warehouses to an open data lake dramatically reduced cloud costs and vendor lock-in, how cloud file targets decoupled compute from storage and improved data pipeline efficiency. I focus on the EL of extract, load, transform and the scalability of open table formats. I conclude with UniForm, the hope to end the “battle of metastores”, and the future state of data lakes. My insights can help you choose the most appropriate technology to accommodate diverse analytics, machine learning, and product use cases.
I hope the audience is empowered to choose open source technology and open table format. https://delta.io/for the UniForm project
Bio: Christina is passionate about open source, multi-cloud, scalable and efficient data pipelines. She makes data informed architectural decisions to build modern systems that support advanced analytics, machine learning, and customer facing product use cases. She has a keen interest in interdisciplinary areas such as DevOps, MLOps and Cloud FinOps. She loves to learn, share and contribute to the open source community.