Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
-
Updated
Jun 3, 2024 - Python
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
Flash linear attention kernels in Triton
Explorations into the recently proposed Taylor Series Linear Attention
Implementation of Agent Attention in Pytorch
LEAP: Linear Explainable Attention in Parallel for causal language modeling with O(1) path length, and O(1) inference
CUDA implementation of autoregressive linear attention, with all the latest research findings
RWKV Wiki website (archived, please visit official wiki)
Implementation of: Hydra Attention: Efficient Attention with Many Heads (https://arxiv.org/abs/2209.07484)
The semantic segmentation of remote sensing images
The semantic segmentation of remote sensing images
Taming Transformers for High-Resolution Image Synthesis
Add a description, image, and links to the linear-attention topic page so that developers can more easily learn about it.
To associate your repository with the linear-attention topic, visit your repo's landing page and select "manage topics."