Audio-Visual Speech Enhancement and Separation Based on Deep Learning


Sometimes a speech signal might be degraded by background noise, making it hard to understand the speech content. Speech enhancement and separation systems can be used to reduce the noise level and improve the quality and the intelligibility of a speech signal. The use of visual cues, such as mouth movements and facial expressions, might be beneficial for the systems and provide a substantial performance improvement.

In this session, participants will be introduced to recent advances in audio-visual speech enhancement and separation, which has a variety of different applications, including:
* hearing assistive devices;
* videoconference systems;
* noise reduction in live videos.

Session Outline
The objective of this session is to provide participants with the basic theoretical background that would allow them to understand and build state-of-the-art systems for audio-visual speech enhancement and separation.

MODULE 1 - Introduction to deep learning and audio-visual speech corpora.
Participants will familiarise themselves with the concept of deep learning with a specific focus on audio-visual speech enhancement and separation. In addition, some characteristics of common audio-visual speech corpora will be explained.

MODULE 2 - Audio-visual speech enhancement and separation systems.
Participants will learn the main components of state-of-the-art approaches for audio-visual speech enhancement and separation.

MODULE 3 - Examples and demos
Participants will have the possibility to learn about recent advances in audio-visual speech enhancement and separation systems and experience some demos of these systems.

Background Knowledge
Basic knowledge of signal processing and deep learning.


Daniel Michelsanti is an Industrial Postdoctoral Researcher at Demant and Aalborg University. He has a PhD in Electrical and Electronic Engineering obtained at Aalborg University. Currently, he is investigating cutting-edge technologies for next-generation hearing assistive devices, with the goal of improving the life quality of people with hearing loss.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google