Abstract: In recent years, Computer Vision (CV) has seen quick growth in quality and usability, and this has helped drive business adoption of artificial intelligence solutions. Researchers have been moving away from traditional methods and towards solutions based on deep neural networks.
However, training and deployment of deep networks in realistic business scenarios remain a challenge for both data scientists and engineers. The goal of this workshop is therefore to (i) educate the audience on the state-of-the-art in the CV domain; (ii) explain common pit-falls and generally the magic around Deep-Learning; and (iii) provide resources and code examples for various CV tasks by leveraging open-source libraries.
This workshop will begin with an overview of common real-world tasks in the CV domain, including examples of problems our customers have faced in recent years. We will then give a brief introduction to deep learning models for CV.
The main part of this session will demonstrate how to train and evaluate CV models by executing notebooks based on PyTorch’s Fast.ai and Torchvision libraries. We will start with image classification, how to fine-tune a pre-trained ImageNet model on a custom dataset, and show how to deploy the model to the cloud. Next, we will train an object detection model and extend the model to segmentation masks and keypoints. Finally, we will build an image similarity system and demo a fast image retrieval solution that can handle large amounts of images.
We conclude by providing links to all examples and material for the audience to continue their learning journey. All resources and code examples will be based on the public Computer Vision Best-Practices GitHub repository: https://github.com/microsoft/ComputerVision
Bio: Patrick Buehler is a principal data scientist at Microsoft’s Cloud AI Group. He obtained his PhD from the Oxford VGG group in Computer Vision with Prof. Andrew Zisserman. He has over fifteen years of working experience in academic settings and with various external customers spanning a wide range of Computer Vision problems.