Abstract: In the field of Automatic Speech Recognition (ASR), the state of the art for generic conversations have reached super human levels. However, things are not nearly as good in specialized knowledge domains: attempting to transcribe vendor-customer or intra-vendor conversations often results in high double-digit error rates. Considering the low performance of ASR on real data, it becomes imperative to customize the end-to-end probabilistic model instead of analyzing Language Model and Acoustic Model separately. This session will focus on discussing grapheme based end-to-end Recurrent Neural Network architectures which can transcribe audios directly. We will also have a reality check to reduce latency by tweaking the model during inference time.
DL Data Scientist | Cisco Systems Inc.