【论文笔记】An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(Vision Transformer, ViT) 文章题目:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale作者:Dosovitskiy
“We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks.” ——完全不依赖CNN 参考:Vision T