Abstract: Twitter is what's happening in the world right now. In order to understand and organize content on the platform, we leverage a semantic text representation that is useful across a variety of tasks. Because content on Twitter spans a wide range of diverse topics and is constantly changing, supervised training that traditionally relies on human-annotated corpora proves to be expensive and unscalable.
Sijun He and Kenny Leung share their experience building and serving self-supervised content representations for heterogeneous content on Twitter. They also highlight various applications of content embeddings in recommendation systems, as well as the engineering challenge of maintaining such embeddings at scale.
Bio: Sijun He is a machine learning engineer at Twitter Cortex, where he works on content understanding with deep learning and NLP. Previously, he was a data scientist at Autodesk. Sijun holds an MS in statistics from Stanford University.