
Abstract: In this session we will discuss state of the art research in multimodal machine learning for surveillance. We will explore various topics from image captioning to video understanding with focus on using multimodal data streams and deep learning algorithms which have contributed to the increasing universality of deep multimodal learning. We will also look at the challenges and opportunities to apply these techniques to the surveillance domain.
Session Outline:
Module 1: A study in Multimodal Learning and Surveillance
We will review concepts of Multimodal Learning and how they can be applied to surveillance systems.
Module 2: Availability and Challenges with the Datasets
A look at various datasets out there for crime and surveillance with strengths and weakness for each.
Module 3: State of the art in Multimodal surveillance learning
A review of various state of the art approaches and papers in Multimodal surveillance learning.
Module 4: Experiments and the Road Ahead
A look at various experiments we have done in Multimodal surveillance learning. our observations, conclusions and the path forward.
Background Knowledge:
Basic Concepts of Machine Learning and AI, Basic Intro to CNNs, RNNs and Transformers
Bio: Utkarsh Contractor is the VP of AI and Machine Learning at Aisera, where he leads the data science team working on machine learning and artificial intelligence applications in the fields of Natural Language Processing and Computer Vision. As a graduate student at Stanford University, his research focussed on experiments in computer vision, using Deep Neural Networks to analyze surveillance scene imagery and footages. Utkarsh has a decade of industry experience in Computer Vision, NLP and other Machine Learning domains working at companies such as Aisera, LinkedIn and AT&T Labs.

Utkarsh Contractor
Title
Vice President of Artificial Intelligence | Aisera Inc.
