TransformerΒΆ
- Attention is All You Need
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)
- Pre-Trained Image Processing Transformer
- A Survey on Visual Transformer
- Learning Texture Transformer Network for Image Super-Resolution
- Taming Transformers for High-Resolution Image Synthesis
- Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving
- Residual Non-Local Attention Networks for Image Restoration
- CPTR: Full Transformer Network for Image Captioning
- Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet