Transformers Know More Than Meets the Eye


Transformer has been around for a while now, and has proven to be one of the most interesting models of modern deep learning.

It turns out that Transformer models have proven to be domain agnostic. It means that, that despite of its initial application for seq2seq NLP tasks with 1D sequences of text, the 1D transformer input can be of any form. Namely, a 2D image unrolled into long 1D sequence of pixels can be understood with notion of its 2D image characteristics involving object appearance, category, or even predicting next image appearance in very long sequences.

Recent research show that transformer originated architectures for computer vision often tends to be simpler and provide performance at worst on pair with modern architectures such as RCNNs used for the computer vision tasks. Presentation is going to discuss recent research in area of transformer applications in new domains.

Session Outline
Module 1: Introduction to Attention and Transformers
Introduce the concept of attention and its applications before it has become integral part of the transformer architecture. Describe the key building blocks of transformer architecture discussing their intuition and applications.

Module 2: Combining the Attention with Dynamic Routing
Recent research shows, that extracting object-centric representations with new kind of attention called Slot Attention, can enable generalization to unseen compositions. It relates to unsupervised object discovery from images. In this module we will explain the slot attention based on resent research.

Module 3: Transformers for object detection
This part will focus on discussing and explaining the intuitions behind the recent research on how the transformer architecture, originating from NLP, has proven to be suitable for computer vision domain, along state of the art research with examples. It is interesting to see how the the 2D image can be used with Transformer for object detection.

Background Knowledge
Attention, Transformer, CNNs


Dr Michał Chromiak is a Director at UBS, contributing to text document analytics and leading efforts to democratize AI in financial sector and investigate applications for multiple ML based tasks. Michal is also a member of Department of Intelligent Systems at Maria Curie-Skłodowska University. Past research in integrating big data from distributed and heterogeneous sources, brought him to concentrate on data perception using deep learning. He is interested in improving and understanding ways to generalize the modern deep learning algorithms and finding their best suited AI applications. He is strongly fascinated with understanding how deep learning can be improved to match, and exceed biological forms of intelligence, in terms of performance.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google