Abstract: Keywords: # transformers # transfer leaning # zero-shot # literature review # application
We ingest 2 million documents monthly at CB Insights (CBI) to empower tech decision-makers and researchers. From raw data to insights, the R&D team takes on many holy grail challenges, a major one being how to extract relevant information with scale, speed, and precision.
When we started at CBI, NLP was still prehistoric when the ""bag of words"" walked the earth. Fast forward ten years, the birth of the ""attention mechanism"" created an NLP explosion and a strong tailwind for teams big and small to ride.
In this talk, we'll share how we modernized our NLP stack @ CBI R&D and the challenges we met with. Part I will walk you through the timeline and milestones of NLP evolution, highlighting significant trends after the ""attention"" revolution. Part II will discuss battle-ready lessons gained using transformer models across various tasks and languages, leveraging open source libraries such as HuggingFace Transformers and Pytorch Lightning.
Bio: Rongyao is a data scientist bootstrapped from stats and social science training and an industry-honed engineer. She has worked across domains from anti-corruption research to digital advertising and finance, She specializes in end-to-end problem solving from ideation to deployment. She has spent the past four years absorbing the explosion of deep learning and NLP and leveraging them to scale information extraction at CB Insights R&D.