
Abstract: The problem of finding duplicates in an image collection is widespread. Many online businesses rely on image galleries to deliver a good customer experience and consequently, generate more revenue. Hence, the image galleries need to be of the highest quality. The presence of duplicates in such galleries could potentially degrade the customer experience. Additionally, image-based machine learning models could generate misleading results due to the duplicates present in the training/evaluation/test sets.
Therefore, finding and removing duplicates is an important requirement across several use cases. In this talk, we want to present imagededup, a Python package we built to solve the problem of finding exact and near-duplicates in an image collection. We will speak about the motivation behind building it, its functionality, and also give a demo.
Bio: Tanuj Jain is a Senior Machine Learning Engineer working at Axel Springer AI (located in Berlin) since December 2019. I was previously a part of the Data Science team at idealo Internet GmbH. My current interests revolve around deep learning research for speech and image processing. I completed my M.Sc. in Electrical Engineering from Paderborn University in 2015 and my B.Tech from GGSIP University, New Delhi in 2010. I’m very interested in leveraging the power of machine learning to empower businesses and measure the impact thus created.

Tanuj Jain
Title
Sr. Machine Learning Engineer | Axel Springer AI
