Tower of Babel: Making Apache Spark, Apache Mahout, Kubeflow, and Kubernetes Play Nice


Working with big data matrices is challenging, Kubernetes allows users to elastically scale, but can only have a pod as large as a node, which may not be large enough to fit the matrix in memory. While Kubernetes allows for other paradigms on top of it which allows pods to coordinate on individual jobs, setting them up and making them play nice with ML platforms is not straightforward. Using Apache Spark and Apache Mahout we can work with matrices of any dimension and distribute them across an unbounded number of pods/nodes, and we can use Kubeflow to make our work quickly and easily reproducible. In this talk, we’ll discuss how we used Apache Spark and Mahout to denoise DICOM images of lungs of COVID patients and published our Pipeline with Kubeflow to make the process easily repeatable which could help doctors in more resource limited hospitals, as well as other researchers seeking to automate the detection of COVID.


Trevor is the Director of Developer Relations at Arrikto and an international speaker excited to be back on the road after a 2 year COVID hiatus. He is also a member and involved with leadership of several projects at the Apache Software Foundation, PMC Chair of Apache Mahout, and Author of Kubeflow For Machine Learning: From Lab to Production.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google