Self-supervised Representation Learning for Speech Processing


Although Deep Learning models have revolutionized the speech and audio processing field, they forced building specialist models for individual tasks and application scenarios. Deep neural models also bottlenecked dialects and languages with limited labeled data.

Self-supervised representation learning methods promise a single universal model to benefit a collection of tasks and domains. They recently succeeded in NLP and computer vision domains, reaching new performance levels while reducing required labels for many downstream scenarios. Speech representation learning is experiencing similar progress with three main categories: generative, contrastive, predictive. Other approaches relied on multi-modal data for pre-training, mixing text or visual data streams with speech. This talk will present self-supervised speech representation learning approaches and their connection to related research areas. Since many of the current methods focused solely on automatic speech recognition as a downstream task, we will review recent efforts on benchmarking learned representations to extend the application of such representations beyond speech recognition.


Abdelrahman Mohamed (PhD) is a research scientist at Meta AI Research (previously, Facebook AI Research (FAIR)). He was a principal scientist/manager in Amazon Alexa and a researcher in Microsoft Research. Abdelrahman was part of the team that started the Deep Learning revolution in Spoken Language Processing in 2009. He is the recipient of the IEEE Signal Processing Society Best Journal Paper Award for 2016. His current research interest focuses on improving, using, and benchmarking learned speech representations, e.g. HuBERT, Wav2vec 2.0, TextlessNLP, and SUPERB.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google