Founder, Architect, Builder at Datafutures
Elliott is an expert in data engineering, data warehousing, information management, and technology innovation with a passion for helping transform data into powerful information. He has more than a decade of experience implementing cutting-edge, data-driven applications. He has a passion for helping organizations understand the true potential in their data by working as a leader, architect, and hands-on contributor. Elliott has built nearly a dozen cloud-native data platforms on AWS, ranging from data warehouses and data lakes, to real-time activation platforms in companies ranging from small startups to large enterprises.
All Sessions by Elliott Cordo
Data Pipeline Architecture - Stop Building MonolithsData Engineering | Intermediate
In modern software development we have fully embraced microservice architecture, for good reason, but in data monoliths are accepted despite their pitfalls. Even when using the latest tooling associated with the “modern data stack” we very often end up creating monoliths, and almost always live to regret it. In small organizations, with small central teams we can get away with this architecture with limited discomfort for some time. In fact, like when developing any small software project, the monolith seems to save time, and gives the impression of higher productivity. But as complexity increases developer experience and productivity drops, and our system begins to get more brittle, frustrating both our engineering teams and stakeholders. Monolithic architecture is even more cumbersome in larger teams, especially in organizations that allow for federated data product development. So what’s the answer? How can we take inspiration from what’s been done in Microservices and Event Based Architecture? How can we apply some of the concepts of Data Mesh architecture? In this talk we will review how these patterns, and to what extent technologies can apply, starting from first principles and then working through the implementation patterns to common open source frameworks. This will include multi-Airflow infrastructure, micro-DAG packing and deployment, DBT multi-project implementation, rational use of containers, and data sharing/publication strategies. We will review some approaches for decomposing existing data monoliths, using a real world scenario.