Brand Voice: Deep Learning for Speech Synthesis
Brand Voice: Deep Learning for Speech Synthesis


As with many other fields, text-to-speech (TTS) has reached a new level with the recent advancements in Deep Learning. TTS is a seq2seq problem riddled with peculiarities and specific challenges. A three step approach is the modern solution: first, a sequence-to-sequence model align text and audio; second a feed-forward network predicts spectrogram from the text input and last, a vocoder model synthesizes the final waveform from the predicted spectrograms.

High-dimensional target space, non-bijective text-spectrograms correspondence, large discrepancy between input and target sequence length, student-teacher approach to avoid autoregression, very long output sequences ,slow inference speed and long training times (over 2 weeks), no explicit evaluation metric that correlates with perceived audio quality are some of the challenges of this problem.

During the course of this project we, together with other teams (e.g. Mozilla), have tackled many of these issues and successfully trained the current state-of-the art architectures such as Tacotron and Transformer-based models as well as developed their feed-forward counterparts and made all of it available open source.
Using these models, we created the first brand voice for Axel Springer, which now allows for audio content on the news website.


Christian Schäfer is driving AI research within the Axel Springer group and helps to integrate machine learning systems in production. His goal is to create products that are both smart and pretty, so that everyone will like them. Having an academic background in theoretical physics, Chris is interested in understanding complex phenomena, and his professional interests moved from network theory over neuroscience to deep learning, which all share some interesting similarities

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google