Abstract: In today's data-driven world, organizations face numerous challenges when it comes to managing and harnessing the power of data. ByteDance, a leading global technology company, has taken a significant step in addressing these challenges by open-sourcing its innovative cloud native data warehouse, ByConity. ByConity enables businesses to effectively store, process, and analyze vast amounts of data, unlocking valuable insights and empowering data-driven decision-making.
In this talk, we will explore the capabilities and benefits of ByConity as an open source data warehousing solution. We will delve into the technical aspects of ByConity, discussing its architecture, key features, and the principles that underpin its design. We will also explore the unique challenges that ByteDance encountered and overcame while developing ByConity, highlighting the lessons learned and best practices for implementing a cloud native data warehouse.
During the talk, we will cover the following key points:
1. Introduction to ByConity: Understand the motivation behind ByteDance's decision to open source its cloud native data warehouse and its significance in the current data ecosystem.
2. Architectural Overview: Gain insights into the architecture of ByConity, including its distributed storage, query optimization, and resource management components. Learn how ByConity leverages cloud native technologies such as containerization and orchestration for scalability and reliability.
3. Core Features and Functionality: Explore the key features of ByConity, including data ingestion, storage formats, query engine, and data governance. Discover how ByConity handles both structured and semi-structured data, providing flexibility for a wide range of use cases.
4. Performance and Scalability: Examine how ByConity achieves high performance and scalability through intelligent query optimization, parallel processing, and workload management techniques. Learn how it can handle large data volumes and complex analytics workloads.
5. Future Roadmap and Community Involvement: Gain insights into ByteDance's roadmap for ByConity's future development and how the open source community can contribute to its growth. Explore the possibilities of collaboration and the potential impact ByConity can have on the broader data ecosystem.
By the end of this talk, attendees will have a solid understanding of ByteDance's open source performant and scalable data warehouse, ByConity, and the advantages it offers in the realm of data management and analytics. They will be equipped with the knowledge to evaluate ByConity as a potential solution for their own data analytical needs and explore opportunities for collaboration within the open source community.
Should be familiar with data processing concepts and usage of open source tools like GitHub and Dockers.
Bio: Vini Jaiswal is an award-winning data and AI influencer, advisor and engineer. As an Open Source leader at ByteDance and formerly at Databricks, she played a pivotal role in shaping breakthrough technologies and successfully brought data and AI solutions to 1000s of organizations globally and within the open-source community. Previously, as the VP of Data Science at Citi, she led the development of data science tech stack and pioneered innovative financial use cases. Vini has made notable contributions to open-source technologies such as Apache Spark, Delta Lake, MLflow, ByConity, PyTorch, and incubation LLM and GenAI projects, showcasing her global impact in advancing data science and AI. Vini's commitment to education, diversity, and innovation is evident through her roles as co-chair at GraceHopper, member at ACM, Linux Foundation, AI/ML Dev Advisory Board, and contributor to academia.