Combining Millions of Products Into One Marketplace Using Computer Vision and Natural Language Processing
Combining Millions of Products Into One Marketplace Using Computer Vision and Natural Language Processing


ShopRunner is an e-commerce company that receives feeds of product data from over 140 different retailer partners, including large department stores and retailers that specialize in clothing, electronics, appliances, nutritional products, and more. In order to provide a good user experience on our website and our mobile app, we need to have one easy-to-navigate product taxonomy. However, each retailer has their own taxonomy that is tuned to their specific needs. We also would like to have sets of attribute tags that make it easy to filter down to exactly what any shopper is looking for. For example, we would like to know if any given product is not only a women's dress, but also the color, fabric type, and whether or not it has pockets.

In this talk I will describe how ShopRunner is working to use product images and text data (such as product name and description) together to automatically combine all of our retailers' products into one easy-to-navigate shopping experience. With computer vision and natural language processing techniques we can tell dresses from pants, green from blue, and maternity from petite sizing. I will cover the basic frameworks we use and how we combine everything into one pipeline, touching on some of the complications that we've encountered along the way.


Ali Vanderveld is Head of Data Science at ShopRunner, where her team leverages data from a network of over 140 retailers to build products for their 6 million members. Prior to ShopRunner, she was a staff data scientist at Civis Analytics, a consulting and software startup that helps companies, nonprofits, and political organizations better utilize their data. She has also worked at Groupon and as a technical mentor for the Data Science for Social Good Fellowship. Ali has a PhD in theoretical astrophysics from Cornell University and got her start working as an academic researcher at Caltech, the NASA Jet Propulsion Laboratory, and the University of Chicago, working on the development teams for several space telescope missions, including ESA's Euclid.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google