Abstract: Learn more about how we built, tested and delivered a near-real-time data pipeline using Apache Spark in the cloud in two weeks -- and still saw our families. We faced a looming deadline, and real-time analytics requirements. Using a cloud-based platform with Spark and Impala running on Microsoft Azure, and armed with a few hundred lines of Python code, we designed, tested and deployed an end-to-end data pipeline and analytics infrastructure in two weeks. The project had its challenges, both technical and operational; learn what we learned and our tips for success.
Bio: Dan Stair is a Senior Engineer with Cazena’s data engineering platform team. He has years of experience developing and running high-performance databases and Hadoop clusters. At Cazena, Dan focuses on developing software and processes that make cloud data platform deployments automated, efficient and secure. He also works closely with companies to configure and optimize cloud data environments for unique requirements.