Doctoral colloquium - Štefan Pócoš (4.12.2023)
Monday 4.12.2023 at 13:10, Lecture room I/9
Štefan Pócoš:
RecViT: Enhancing Vision Transformer with Top-Down Information Flow
Abstract:
We propose and analyse a novel neural network architecture — recurrent vision transformer (RecViT). Building upon the popular vision transformer (ViT), we add a biologically inspired top-down connection, letting the network ‘reconsider’ its initial prediction. Moreover, using a recurrent connection creates space for feeding multiple similar, yet slightly modified or augmented inputs into the network, in a single forward pass. As it has been shown that a top-down connection can increase accuracy in case of convolutional networks, we analyse our architecture, combined with multiple training strategies, in the adversarial examples (AEs) scenario. Our results show that some versions of RecViT may indeed be more robust than the baseline ViT. We also leverage the fact that transformer networks have certain level of inherent explainability. By visualising attention maps of various input images, we gain some insight into the inner workings of our network.