Self-Supervised Learning and Natural Language Processing for Hate Speech Detection
Self-Supervised Learning and Natural Language Processing for Hate Speech Detection


Social media platforms have become a hotbed for hate speech. They have been increasingly exploited for the propagation of harmful content and abusive language. Violence and hate-crimes attributed to online hate speech have increased worldwide.

It has been increasingly important to build AI systems that can automatically identify hate speech from text content. However, most of machine learning classifiers rely on supervised training. The shortage of labeled training data is one of the biggest challenges in building highly efficient hate-speech detection models.

Self-supervised training has become a prominent approach to solve this kind of problems. This approach allows the model to learn from entirely unlabeled data.
One of the breakthroughs of self-supervised learning in NLP is the Transformer. To compute the representation of a text sequence, the Transformer model relies entirely on self-attention mechanism by relating different positions of the text sequence. The Transformer marks an important paradigm shift in how we understand and model language. It is also behind the recent NLP developments in 2019, including Google’s BERT self-supervised language understanding model and FacebookAI’s multilingual language model XLM/mBERT.

We will learn how the Transformer idea works, how it’s related to language representation.
We will learn how we can leverage self-supervised language understanding models, like BERT’s embeddings, and a small amount of labeled data to build ML models that can automatically identify hate speech in text content with high accuracy.
We will also go through the details of the different algorithms and code implementations to give you a hands-on learning experience.


Sihem Romdhani received her MASc degree in Machine Learning from the department of Electrical and Computer Engineering at the University of Waterloo-Canada, where her research was focused on Deep Learning for Speech Recognition. She is currently working with Veeva Systems as a Data Scientist, where she is building ML models for Natural Language Processing. She has led multiple projects on text parsing, sequence tagging, and information extraction from unstructured text data. She has also worked on recommendation systems using different ML algorithms including Reinforcement Learning. Sihem is very interested in AI and how to solve new and challenging problems. Throughout her education, academic research, and work in industry, she gathered experiences and knowledge that she enjoys sharing by actively doing public presentations.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google