Building an Industry classifier with the latest scraping, NLP and deployment tools
Building an Industry classifier with the latest scraping, NLP and deployment tools


For BlueVine, and indeed for any Fintech company, figuring out the client’s industry is a critical factor in making precise financial decisions. Traditional sources are invariably pricey, inaccurate and unavailable, and as such leave an opening for an ML based solution. We met that challenge by building a service that predicts the industry using the business’s publicly available web data. By employing the latest innovations in NLP (BERT) and the some of the most powerful scraping and deployment tools available (Scrapy and Amazon SageMaker) we were able to dramatically surpass the performance achieved by any other such tool in the space.

This presentation will cover the entire development pipeline hands-on: Crowdsourcing a tagged sample, building a smart and scalable web scraper, prepping and feeding the resulting raw data into BERT, fine tuning the model and finally deploying it as a cloud based service behind an API. Both model training and deployment will be through Amazon SageMaker.


Ido Shlomo is the head of BlueVine’s data science team in the US, where he works on applying machine learning and other automation solutions for risk management, fraud detection and marketing purposes. Recent work is focused on implementing complex NLP tasks in production systems, and specifically on dealing with the the challenge of consuming unstructured data. Previously Ido worked in the Economics department at Tel Aviv University as a researcher in structural macroeconomic modeling. Ido holds a dual BA in mathematics and philosophy and an MA in economics, both from Tel Aviv University.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google