Abstract: The problem of finding duplicates in an image collection is widespread. Many online businesses rely on image galleries to deliver a good customer experience and consequently, generate more revenue. Hence, the image galleries need to be of the highest quality. The presence of duplicates in such galleries could potentially degrade the customer experience. Additionally, image-based machine learning models could generate misleading results due to the duplicates present in the training/evaluation/test sets.
Therefore, finding and removing duplicates is an important requirement across several use cases. In this talk, we want to present imagededup, a Python package we built to solve the problem of finding exact and near-duplicates in an image collection. We will speak about the motivation behind building it, its functionality, and also give a demo.
Bio: Dat is the Head of AI at Axel Springer Ideas Engineering (https://axelspringerideas.de/), the innovation unit of Axel Springer SE which is the largest digital publishing house in Europe. He establishes and leads Axel Springer AI (https://ai.axelspringer.com/) where his goal is to make AI more accessible within Axel Springer and hence drive innovations within the group. His ultimate plan is to turn Axel Springer into an AI-first company.
Dat's interests are diverse from traditional machine learning, deep learning, AI in general to computer vision. Previously, he co-headed the data team at idealo.de where he built up the machine learning team from scratch. His team mainly focused on computer vision problems from teaching a computer to understand aesthetics to upscaling low-resolution images. He is a regular speaker and has presented at several renowned conferences. He also blogs about his work on Medium. His background is in Operations Research and Econometrics. Dat received his MSc in Economics from the Humboldt University of Berlin.