A Dive into Delta Lake: A Modern File Format for the Next-Generation Lake

Abstract: 

Data Lakes are a mainstream part of our data platforms these days. They underpin a lot of our data science and engineering workflows, but it’s hard to talk Data Lakes without someone mentioning the Data Swamp, simply because lakes used to be hard to get right. That’s no longer the case thanks to a wave of next-generation file formats, one of which is Delta. Whether you’re a data scientist, machine learning engineer, or a hardcore data engineer, there are a whole host of Delta features that completely change the landscape of how we’re building data platforms.

In this workshop, I will introduce the Delta file format and how it works, before taking a tour of the many features available in the open source project. I will show you how to get started with Delta in a spark environment, covering a range of features from simple merge statements and temporal querying, right down to some deeper performance tuning. You will leave this workshop ready to work in a Delta Lake architecture, confident that you will avoid the dreaded swamp!

Bio: 

Simon is the Director of Engineering for Advancing Analytics, a Microsoft Data Platform MVP and one of the few Databricks Beacons Globally. Simon has pioneered Lakehouse Architectures for a some of the world’s largest companies, challenging traditional analytical solutions and pushing for the very best for the data industry. Simon runs the Advancing Spark YouTube channel, where he can often be found digging into Spark features, investigating new Microsoft technologies and cheering on the Delta Lake project.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google