Accelerate Chinese-CLIP with FlashAttention

Chinese-CLIP now supports the acceleration of training process through FlashAttention.

Environmental Preparation

Nvidia GPUs with Turning, Ampere, Ada or Hopper architecture (such as H100, A100, RTX 3090, T4, and RTX 2080). Please refer to this document for the corresponding GPUs of each Nvidia architecture.
CUDA 11.4 and above.
PyTorch 1.12 and above.
FlashAttention：Install FlashAttention by executing pip install flash-attn.

Please refer to the FlashAttention project repository for more information.

Use it in Chinese-CLIP!

Applying FlashAttention to the finetune process of Chinese-CLIP is very simple, just add --use-flash-attention to the sh script of finetune. We provide the sample script run_scripts/muge_finetune_vit-b-16_rbt-base_flashattn.sh.

Training Speed and Memory Usage Comparison

Enabling FlashAttention can significantly speed up the finetune process and reduce the memory usage of Chinese-CLIP without affecting the precision. Our experiments are conducted on an 8-card A100 GPU (80GB memory) machine，FlashAttention 0.2.8，Pytorch 1.10.1.

We present the comparison of the batch time and memory usage of FP16 precision finetune for each scale model. The improvement in training speed and reduction in memory usage are more significant for larger models.

	Batch Time
Unit: s/it	Batch size	w/o FlashAttention	w/ FlashAttention	Speedup
CN-CLIP_RN50	1200*8	1.710	1.680	1.02×
CN-CLIP_ViT-B/16	450*8	1.477	0.960	1.54×
CN-CLIP_ViT-L/14	128*8	1.293	0.785	1.65×
CN-CLIP_{ViT-L/14@336px}	40*8	1.397	0.587	2.38×
CN-CLIP_ViT-H/14	64*8	1.265	0.845	1.50×

	Memory
Unit: GB	Batch size	w/o FlashAttention	w/ FlashAttention
CN-CLIP_RN50	1200*8	79	75
CN-CLIP_ViT-B/16	450*8	80	56
CN-CLIP_ViT-L/14	128*8	77	50
CN-CLIP_{ViT-L/14@336px}	40*8	78	37
CN-CLIP_ViT-H/14	64*8	76	57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash_attention_En.md

flash_attention_En.md

Accelerate Chinese-CLIP with FlashAttention

Environmental Preparation

Use it in Chinese-CLIP!

Training Speed and Memory Usage Comparison

Files

flash_attention_En.md

Latest commit

History

flash_attention_En.md

File metadata and controls

Accelerate Chinese-CLIP with FlashAttention

Environmental Preparation

Use it in Chinese-CLIP!

Training Speed and Memory Usage Comparison