General and Efficient Self-supervised Learning with data2vec


While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. In this talk, I will present data2vec, a framework for general self-supervised learning that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches.

Learning objectives: General Self-supervised learning in multiple modalities.


Michael Auli is a principal research scientist/director at FAIR in Menlo Park, California. His work focuses on speech and NLP and he helped create projects such as wav2vec/data2vec, the widely used fairseq toolkit, the first modern feed-forward seq2seq models outperforming RNNs for NLP, and several top ranked submissions at the WMT news translation task in 2018 and 2019. Before that Michael was at Microsoft Research, where he did early work on neural machine translation and using neural language models for conversational applications. During his PhD at the University of Edinburgh he worked on natural language processing and parsing.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google