Unified and Efficient Multimodal Pretraining Across Vision and Language


In this talk, I will present work on enhancing the important aspects of unification, generalization, and efficiency in large-scale pretrained models across vision and language modalities, via different methods and directions of visual grounding for improving both multimodal and text-only NLU tasks. We will start by discussing joint vision and language pretraining models such as LXMERT (large-scale cross-modal pretraining). Next, we will present VL-T5 to unify several multimodal tasks (such as visual question answering, referring expression comprehension, visual reasoning/entailment, visual commonsense reasoning, captioning, and multimodal machine translation) by treating all these tasks as text generation. We will then discuss the direction of improving text-only NLU tasks via visually-grounded supervision and distillation from image and video knowledge transfer (Vokenization, VidLanKD). Finally, we will look at parameter/memory efficiency in VL pretraining via adapter/sidetuning, sparse sampling, and audio replacement methods.


Dr. Mohit Bansal is the John R. & Louise S. Parker Professor and the Director of the MURGe-Lab in the Computer Science department at University of North Carolina (UNC) Chapel Hill. He received his PhD from UC Berkeley and his BTech from IIT Kanpur. His research expertise is in natural language processing and multimodal machine learning, with a particular focus on grounded and embodied semantics, human-like language generation, and interpretable and generalizable deep learning. He is a recipient of DARPA Director's Fellowship, NSF CAREER Award, Army Young Investigator Award, Google Focused Research Award, Microsoft Investigator Fellowship, and outstanding paper awards at ACL, CVPR, EACL, COLING, and CoNLL. His service includes ACL Executive Committee, ACM Doctoral Dissertation Award Committee, Program Co-Chair for CoNLL 2019, ACL Americas Sponsorship Co-Chair, and Associate/Action Editor for TACL, CL, IEEE/ACM TASLP, and CSL journals. Webpage: https://www.cs.unc.edu/~mbansal/

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google