Taming Transformers for High-Resolution Image Synthesis
-
Updated
May 2, 2022 - Jupyter Notebook
Taming Transformers for High-Resolution Image Synthesis
Flash linear attention kernels in Triton
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
LEAP: Linear Explainable Attention in Parallel for causal language modeling with O(1) path length, and O(1) inference
Implementation of: Hydra Attention: Efficient Attention with Many Heads (https://arxiv.org/abs/2209.07484)
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
RWKV Wiki website (archived, please visit official wiki)
Implementation of Agent Attention in Pytorch
Explorations into the recently proposed Taylor Series Linear Attention
The semantic segmentation of remote sensing images
CUDA implementation of autoregressive linear attention, with all the latest research findings
The semantic segmentation of remote sensing images
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Add a description, image, and links to the linear-attention topic page so that developers can more easily learn about it.
To associate your repository with the linear-attention topic, visit your repo's landing page and select "manage topics."