Audio-Visual Speech Enhancement and Separation Based on Deep Learning


Sometimes a speech signal might be degraded by background noise, making it hard to understand the speech content. Speech enhancement and separation systems can be used to reduce the noise level and improve the quality and the intelligibility of a speech signal. The use of visual cues, such as mouth movements and facial expressions, might be beneficial for the systems and provide a substantial performance improvement.

In this session, participants will be introduced to recent advances in audio-visual speech enhancement and separation, which has a variety of different applications, including:
* hearing assistive devices;
* videoconference systems;
* noise reduction in live videos.

Session Outline
The objective of this session is to provide participants with the basic theoretical background that would allow them to understand and build state-of-the-art systems for audio-visual speech enhancement and separation.

MODULE 1 - Introduction to deep learning and audio-visual speech corpora.
Participants will familiarise themselves with the concept of deep learning with a specific focus on audio-visual speech enhancement and separation. In addition, some characteristics of common audio-visual speech corpora will be explained.

MODULE 2 - Audio-visual speech enhancement and separation systems.
Participants will learn the main components of state-of-the-art approaches for audio-visual speech enhancement and separation.

MODULE 3 - Examples and demos
Participants will have the possibility to learn about recent advances in audio-visual speech enhancement and separation systems and experience some demos of these systems.

Background Knowledge
Basic knowledge of signal processing and deep learning.


Prof. Zheng-Hua Tan is a Professor of Machine Learning and Speech Processing, a Co-Head of the Centre for Acoustic Signal Processing Research (CASPR), and Machine Learning Research Group Leader in the Department of Electronic Systems at Aalborg University, Denmark.

Prof. Zheng-Hua Tan
was a Visiting Scientist/Professor at the Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), Cambridge, USA, an Associate Professor in the Department of Electronic Engineering at Shanghai Jiao Tong University, China, and a postdoctoral fellow at AI Spoken Language Lab, in the Department of Computer Science at KAIST, Korea.

He received the B.S. and M.S. degrees in electrical engineering from Hunan University, China, in 1990 and 1996, respectively, and the Ph.D. degree in electronic engineering from Shanghai Jiao Tong University, China, in 1999.

His research interests include machine learning, deep learning, pattern recognition, speech and speaker recognition, noise-robust speech processing, multimodal signal processing, and social robotics. He has over 200 publications. He edited the book Automatic Speech Recognition on Mobile Devices and over Communication Networks (Springer, 2008).

He is the elected Chair of the IEEE Machine Learning for Signal Processing Technical Committee. He is an Associate Editor for the IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING. He has served as an Editorial Board Member/Associate Editor for several journals including Computer Speech and Language, and Digital Signal Processing. He was a Lead Guest Editor of the IEEE Journal of STSP and a Guest Editor of several journals including Neurocomputing. He was the General Chair of IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP 2018), Aalborg, Denmark, and was a Program Co-Chair for IEEE Workshop on Spoken Language Technology (SLT 2016), San Diego, California, USA. He has served as a Chair, Program Co-chair, Area and Session Chair, and Tutorial Speaker of many international conferences. He is a Senior Member of the IEEE.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google