
Abstract: Extracting texts of various sizes, shapes, and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to e-commerce, augmented reality assistance system in a natural scene, content moderation in social media platforms, etc. The text from the image can be a richer and more accurate source of data than human inputs which can be used in several applications like Attribute Extraction, Profanity Checks, etc.
Typically, Extracting Text is achieved in 2 stages:
- Text detection: this module helps to know the regions in the input image where the text is present.
- Text recognition: given the regions in the image where the text is present, this module gives the raw text out of it.
In this session, I will be talking about the Character level Text Detection for detecting normal and arbitrary shaped texts. Later will be discussing the CRNN-CTC network & the need for CTC loss to obtain the raw text from the images.
Bio: Pranay Dugar is passionate about machine learning and deep learning. In specific, he has worked on object detection and has done projects such as text detection, facial recognition, DNA mutation detection. He is proficient in python and the tensor flow framework as well as Keras for creating and running my models. He has taken part in many hackathons, at both national and international levels, even going on to become the eventual winners in DreamWorks and Daimler. Our 'Driver Distraction Detection' based on Convolution neural network, was presented at the Mobile World Congress, Barcelona (2017) by Mercedes. He aims to create artificial intelligence that can observe and interact as well as any human.