Abstract: From its advent more than 40 years ago, robust and high-performing approaches to automatic speech recognition (ASR) have been following a statistical approach based on Bayes decision rule. For decades, state-of-the-art ASR systems were based on standard signal processing for feature extraction, hidden Markov modeling, complex data-driven acoustic and language models, and advanced search concepts based on dynamic programming. This classical approach to automatic speech recognition has not been challenged significantly until recently. Even when artificial neural networks started to considerably boost ASR performance, the general architecture of state-of-the-art ASR systems was not altered considerably. Gradually, deep learning concepts were integrated into the components of the common ASR architecture, leading to neural feature processing, acoustic modeling via hybrid deep neural network (DNN) HMMs, and neural language modeling, boosting ASR performance by more than 50% relative within the last 10 years. Today, the hybrid DNN/HMM approach, together with recurrent long short-term memory (LSTM) neural network language modeling currently marks the state-of-the-art on many ASR tasks, covering a wide range of training set sizes. However, currently more and more alternative approaches occur, moving gradually towards so-called end-to-end approaches. Gradually, these novel end-to-end approaches replace explicit modeling of model properties and dedicated search space organisation by more implicit, integrated neural-network based representations, while also introducing new overall architectures. Corresponding approaches show promising results, especially using large training sets. In this presentation, a contrastive overview of ASR architectures and modeling will be given, including variations of both more classical HMM-based as well as end-to-end modeling approaches.
Bio: Ralf Schlüter serves as Academic Director and senior lecturer in the Computer Science Department at RWTH Aachen University, Germany. He leads the Automatic Speech Recognition Group at the Lehrstuhl Informatik 6: Human Language Technology and Pattern Recognition. He studied Physics at RWTH Aachen University, and Edinburgh University, UK, and received the Diplom degree in Physics, the Dr.rer.nat. degree in Computer Science, and completed his habilitation in Computer Science, all at RWTH Aachen University. His research interests cover automatic speech recognition and machine learning in general, discriminative training, neural network modeling, information theory, stochastic modeling, and speech signal analysis.
Ralf Schlüter, PhD
Academic Director | RWTH Aachen University
deep-learning-europe19 | intermediate-europe19 | research-frontiers-europe19 | workshops-europe19