Data Pipeline Architecture – Stop Building Monoliths

Abstract: 

In modern software development we have fully embraced microservice architecture, for good reason, but in data monoliths are accepted despite their pitfalls. Even when using the latest tooling associated with the “modern data stack” we very often end up creating monoliths, and almost always live to regret it.

In small organizations, with small central teams we can get away with this architecture with limited discomfort for some time. In fact, like when developing any small software project, the monolith seems to save time, and gives the impression of higher productivity. But as complexity increases developer experience and productivity drops, and our system begins to get more brittle, frustrating both our engineering teams and stakeholders.

Monolithic architecture is even more cumbersome in larger teams, especially in organizations that allow for federated data product development.

So what’s the answer? How can we take inspiration from what’s been done in Microservices and Event Based Architecture? How can we apply some of the concepts of Data Mesh architecture?

In this talk we will review how these patterns, and to what extent technologies can apply, starting from first principles and then working through the implementation patterns to common open source frameworks. This will include multi-Airflow infrastructure, micro-DAG packing and deployment, DBT multi-project implementation, rational use of containers, and data sharing/publication strategies.

We will review some approaches for decomposing existing data monoliths, using a real world scenario.

Bio: 

Elliott is an expert in data engineering, data warehousing, information management, and technology innovation with a passion for helping transform data into powerful information. He has more than a decade of experience implementing cutting-edge, data-driven applications. He has a passion for helping organizations understand the true potential in their data by working as a leader, architect, and hands-on contributor.

Elliott has built nearly a dozen cloud-native data platforms on AWS, ranging from data warehouses and data lakes, to real-time activation platforms in companies ranging from small startups to large enterprises.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google