Applications of NLP in Retail/E-commerce


This talk covers 3 examples of using NLP to solve problems in a retail e-commerce context. The NLP techniques are topic modeling and string similarity. All the code examples use open source python libraries. The business contexts in which they will be discussed are identifying customer complaints from online reviews, identifying sample products and identifying similar products. Customer complaints are identified from a corpus of google reviews. The store operations team conducted this exercise to find out how often customers complain about problems like lack of adequate parking in large stores. New problems in stores were also identified. We walk through an example of topic modeling, as well as data cleaning for string data. Another problem at Home Depot is that many vendors sell products through our website. An interesting effect of this is that some samples are sold by vendors different from the vendor who sells the main item. Some tile samples are not linked to the items so they do not show up on the product webpage. Our user research shows that items with samples have higher online sales. We walk through some measures of string similarity like Levenstein distance and cosine similarity and their advantages and disadvantages. Lastly, a pretty standard recommender system on any e-commerce website nowadays is “Similar Items”. NLP techniques can be used on a corpus of all the product titles and descriptions to identify such products. The dataset here is too large to use the previously discussed methods. In this case we walk through string embeddings and nearest neighbors on the embeddings to identify the most similar items. The first and third examples will be reproduced with small datasets since yelp reviews and product names are publicly available data. The second one I am reasonably certain I can use a toy dataset.


Shoili Pal is a Data Scientist at The Home Depot where she currently works on Recommendations and Personalization. She has also worked in product data science teams, a finance team and two early stage startups. She holds a Masters in Analytics from Georgia Tech and a Masters in Operations Research from the London School of Economics. In her spare time she reads fantasy and science fiction, builds Lego sets and goes on bike rides.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google