Abstract: The term ""Feature Store"" often conjures a simplistic idea of a storage place for features. However, in reality, they serve as robust frameworks and orchestrators for defining, managing, and deploying feature pipelines. The veneer of simplicity often masks the significant operational gains organizations can achieve by integrating the right feature store into their ML platform. This session is designed to peel back the layers of ambiguity surrounding feature stores, delineating the three distinct types and their alignment within a broader ML ecosystem.
Diving into a hands-on section, we will walk through the process of training and deploying an end-to-end fraud detection model utilizing Featureform, Redis, Databricks, and Sagemaker. The emphasis will be on real-world, applicable examples, moving beyond concepts and marketing talk.
This session aims to do more than just explain the mechanics of feature stores. It provides a practical blueprint to efficiently harness feature stores within ML workflows, effectively bridging the chasm between theoretical understanding and actionable implementation. Participants will walk away with a solid grasp of feature stores, equipped with the knowledge to drive meaningful insights and enhancements in their real-world ML platforms and projects.
Lesson 1: Demystifying Feature Stores
""Feature Store"" evokes a simplistic image of a place to store features. In practice, they are frameworks and orchestrators to define, manage, and deploy your feature pipelines. There's some confusion about what a feature store really does beyond the catchy phrases. In reality, there are three distinct types of feature stores, we will lay them all out. We'll explore what feature stores are, look at common designs, see how they fit in an ML platform, and discuss when you might need one.
Lesson 2: Define and run feature pipelines with Featureform to run on Databricks
After understanding the basics, we'll start building. Using a dataset of transactions, we'll define our feature pipelines in Featureform to be run on Databricks. This will give us a hands-on experience on how to turn concepts into action.
Lesson 3: Serving features for training and inference
Now that we have our features and training sets ready, we'll use Featureform to serve them to train a model in Sagemaker. Once our model is trained, we'll deploy it and see how to serve features to it for making predictions. This will give us a full picture of how to actually use the features we define in a model both for training and inference.
Learning Objective: We will train and deploy a model end-to-end using common MLOps tooling in an AWS environment: Featureform, Databricks, Redis, and Sagemaker
Have trained and deployed a model before.
Basics of Spark
Basics of Sagemaker notebooks, training, and inference
Have reviewed Featureform and/or its docs
A jupyter environment (Google colab is fine)
Bio: Simba Khadder is the founder & CEO of Featureform. He started his ML career in recommender systems where he architected a multi-modal personalization engine that powered 100s of millions of user’s experiences. He later open-sourced and built a company around their feature store. Featureform is the virtual feature store. It enables data scientists to define, manage, and serve model features using a Python framework. Simba is also a published astrophysicist, an avid surfer, and ran a marathon in basketball shoes.