Abstract: Data scientists are drowning in mundane tasks, and that partly includes the feature engineering process too. Yes, you read it right, and you know it’s true: not all feature engineering work is fun. Many of its steps are applicable for several different machine learning problems and use cases. So why are we still repeating ourselves each time ?
It is time to have a more systematic approach to feature engineering. Long gone the period where it is considered as a black art, a secret that every data scientist must keep for himself. Now we need a tool that allows data scientists as well as data analysts in an organisation to leverage and share their knowledge, their experience on a problem or a dataset. We need a tool that frees data scientists from the mundane little details of feature engineering and focus on the big picture, on the big question: what kind of information can solve my business problem ?
In this talk, we will explain our approach to build our “human-powered” automatic feature engineering tool. It leverages the user’s knowledge about the data to generate expressive features. The goal is to have a versatile, modular and interpretable pipeline that helps data scientists accelerating the trial-and-error process.
We will go through some popular problems in the e-commerce like churn prediction, fraud detection and demonstrate the effectiveness as well the flexibility of our method.
In an era where the automatic machine learning approach like deep learning is gaining momentum, we believe that feature engineering still has its place, but the whole process needs to be improved. And this package might be the answer.
Bio: Jorie Koster-Hale is a broadly-trained data scientist at Dataiku with expertise in healthcare data, neuroscience, and machine learning. She is an award-winning researcher and instructor. Prior to joining Dataiku, she completed her Ph.D. in Cognitive Neuroscience at Massachusetts Institute of Technology and worked as a Postdoctoral Fellow at Harvard.
Jorie currently resides in Paris where she helps clients research, analyze, build and deploy scalable data products