Abstract: Using tensorflow with big datasets has been an impediment for building deep learning models due to the added complexities of running it in a distributed setting and complicated MLOps code, recent advancements in tensorflow 2, and some extension libraries for Spark has now simplified a lot of this. This talk focuses on how we can leverage the best of both Spark and tensorflow to build machine learning and deep learning models using minimal MLOps code letting Spark handle the grunt of work, enabling us to focus more on feature engineering and building the model itself. This design also enables us to use any of the libraries in the tensorflow ecosystem (like tensorflow recommenders) with the same boilerplate code. For businesses like ours, fast prototyping and quick experimentations are key to building completely new experiences in an efficient and iterative way. It is always preferable to have tangible results before putting more resources into a certain project. This design provides us with that capability and lets us spend more time on research, building models, testing quickly, and rapidly iterating. It also provides us with the flexibility to use our choice of the framework at any stage of the machine learning lifecycle. In this talk, we will go through some of the best and new features of both spark and tensorflow, how to go from single node training to distributed training with very few extra lines of code, how to leverage MLFlow as a central model store, and finally, using these models for batch and real-time inference.
Bio: Ronny Mathew is a Data Science lead at Rue Gilt Groupe building next-generation online shopping experiences for their members. He is passionate about applied machine learning and deep learning and works on recommendation systems, computer vision, and Natural language processing for big data. At RGG, they are currently building the next generation of their personalization platform leveraging cutting-edge tools and algorithms.