Abstract: Typically when we consider disinformation narratives, we think about the most pervasive ones: QAnon, “pizzagate”, and false information about COVID-19. These are the ones that receive widespread media attention and work to polarize individuals on a broad scale. The problem with knowing only the most “popular” disinformation narratives - for both the average Internet user and the analyst - is that by the time we become aware of their existence, they have already influenced and been shared by a lot of people. With this issue in mind, researchers should be investigating ways through which disinformation narratives can be flagged and identified without having a priori knowledge of what to look for.
In this talk, Carlos and Amber will walk through an NLP-based method they have used for combining open source deep learning models (BERT, GPT-2) and topic modeling (LDA) to identify disinformation narratives in articles. This approach involves first using a binary classifier to find texts that are potential sources of disinformation. Once these texts have been reviewed and classified by a subject matter expert, clusters of narratives can be extracted from the documents and used for further analyses. After discussing the technical approach, they will demonstrate the method on a case study using news articles that have been posted on Twitter. This allows for a real-time assessment of shared media, particularly in a high-traffic environment that encourages virality. From here, the session will conclude by demonstrating how this approach can be used outside the realm of disinformation narrative detection, specifically as a tool to analyze public responses to new products, brands, or even government policies.
Bio: Carlos is a Machine Learning Engineer at Novetta and a graduate student in Computational Linguistics at Montclair State University. His primary interests are in natural language processing, linguistics, and voice application development. Prior to working at Novetta, Carlos worked as a voice app developer for Voicefirst Tech and as a NLP research assistant for the Montclair State University research lab detecting censored language on Chinese social media.