flash-attention

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention streaming-llm deepseek open-sora

Updated Jun 5, 2024

flashinfer-ai / flashinfer

Star

FlashInfer: Kernel Library for LLM Serving

gpu cuda pytorch tvm llm-inference flash-attention large-large-models

Updated Jun 5, 2024
Cuda

DefTruth / CUDA-Learn-Notes

Star

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

cuda cuda-kernels gemm softmax cuda-programming layernorm gemv elementwise rmsnorm flash-attention flash-attention-2 warp-reduce block-reduce

Updated Jun 3, 2024
Cuda

CoinCheung / gdGPT

Star

Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

nlp bloom pipeline pytorch deepspeed llm full-finetune model-parallization flash-attention llama2 baichuan2-7b chatglm3-6b mixtral-8x7b

Updated Feb 5, 2024
Python

RulinShao / FastCkpt

Star

Python package for rematerialization-aware gradient checkpointing

gradient-checkpointing flash-attention

Updated Oct 31, 2023
Python

Naman-ntc / FastCode

Star

Utilities for efficient fine-tuning, inference and evaluation of code generation models

transformers efficient inference code-generation finetuning flash-attention

Updated Oct 3, 2023
Python

kklemon / FlashPerceiver

Star

Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.

nlp deep-learning transformer attention-mechanism perceiver flash-attention

Updated Oct 10, 2023
Python

kyegomez / FlashMHA

Sponsor

Star

An simple pytorch implementation of Flash MultiHead Attention

artificial-intelligence transformer attention artificial-neural-networks attention-mechanisms attentionisallyouneed gpt4 flash-attention

Updated Feb 5, 2024
Jupyter Notebook

graphcore-research / flash-attention-ipu

Star

Poplar implementation of FlashAttention for IPU

deep-learning transformers pytorch ipu graphcore poplar flash-attention flash-attention-2

Updated Mar 12, 2024
C++

Improve this page

Add a description, image, and links to the flash-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the flash-attention topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash-attention

Here are 12 public repositories matching this topic...

QwenLM / Qwen

ymcui / Chinese-LLaMA-Alpaca-2

InternLM / InternLM

DefTruth / Awesome-LLM-Inference

flashinfer-ai / flashinfer

DefTruth / CUDA-Learn-Notes

CoinCheung / gdGPT

RulinShao / FastCkpt

Naman-ntc / FastCode

kklemon / FlashPerceiver

kyegomez / FlashMHA

graphcore-research / flash-attention-ipu

Improve this page

Add this topic to your repo