Riding the Tailwind of NLP Explosion


Keywords: # transformers # transfer leaning # zero-shot # literature review # application

We ingest 2 million documents monthly at CB Insights (CBI) to empower tech decision-makers and researchers. From raw data to insights, the R&D team takes on many holy grail challenges, a major one being how to extract relevant information with scale, speed, and precision.

When we started at CBI, NLP was still prehistoric when the ""bag of words"" walked the earth. Fast forward ten years, the birth of the ""attention mechanism"" created an NLP explosion and a strong tailwind for teams big and small to ride.

In this talk, we'll share how we modernized our NLP stack @ CBI R&D and the challenges we met with. Part I will walk you through the timeline and milestones of NLP evolution, highlighting significant trends after the ""attention"" revolution. Part II will discuss battle-ready lessons gained using transformer models across various tasks and languages, leveraging open source libraries such as HuggingFace Transformers and Pytorch Lightning.


Rongyao is a data scientist bootstrapped from stats and social science training and an industry-honed engineer. She has worked across domains from anti-corruption research to digital advertising and finance, She specializes in end-to-end problem solving from ideation to deployment. She has spent the past four years absorbing the explosion of deep learning and NLP and leveraging them to scale information extraction at CB Insights R&D.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google