Ashish Vaswani - Self-attention for Vision

Abstract

Deep learning seeks to discover universal models that work across all modalities and tasks. While self-attention has enhanced the capabilities of models in natural language, convolutions still dominate the landscape of vision architectures. In this talk, we will present a series of results culminating in self-attention models that are competitive with state-of-the-art convolutional models for fundamental image tasks. We will also present the strengths and limitations of our models, and discuss the implications of our work for building universal primitives for all vision tasks.

Date
Event
Location
Virtual.