Narrative Extraction for Disinformation Detection
Narrative Extraction for Disinformation Detection


Typically when we consider disinformation narratives, we think about the most pervasive ones: QAnon, “pizzagate”, and false information about COVID-19. These are the ones that receive widespread media attention and work to polarize individuals on a broad scale. The problem with knowing only the most “popular” disinformation narratives - for both the average Internet user and the analyst - is that by the time we become aware of their existence, they have already influenced and been shared by a lot of people. With this issue in mind, researchers should be investigating ways through which disinformation narratives can be flagged and identified without having a priori knowledge of what to look for.
In this talk, Carlos and Amber will walk through an NLP-based method they have used for combining open source deep learning models (BERT, GPT-2) and topic modeling (LDA) to identify disinformation narratives in articles. This approach involves first using a binary classifier to find texts that are potential sources of disinformation. Once these texts have been reviewed and classified by a subject matter expert, clusters of narratives can be extracted from the documents and used for further analyses. After discussing the technical approach, they will demonstrate the method on a case study using news articles that have been posted on Twitter. This allows for a real-time assessment of shared media, particularly in a high-traffic environment that encourages virality. From here, the session will conclude by demonstrating how this approach can be used outside the realm of disinformation narrative detection, specifically as a tool to analyze public responses to new products, brands, or even government policies.


Carlos is a Machine Learning Engineer at Novetta and a graduate student in Computational Linguistics at Montclair State University. His primary interests are in natural language processing, linguistics, and voice application development. Prior to working at Novetta, Carlos worked as a voice app developer for Voicefirst Tech and as a NLP research assistant for the Montclair State University research lab detecting censored language on Chinese social media.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google