Skip to content

Tanglumy/parallel-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Parallel Transformer

a fast and elegant to learn how to optimize transformer block.

Overview

transformer is a very hot NLP model in recent years, the model uses the architecture of seq2seq, the original intention of the paper aims to solve the machine translation task, in the subsequent BERT and other models, has been expanded to solve almost all NLP tasks. transformer

optimize target

using SIMD,OpenMP,MPI to optimize Transformer attention block. This repo considers unroll, SSE, OpenMP, tile and cuda to compare all the parallel functions.

how to run the code

mkdir build
cd build
cmake ..    

experiments results

using speedup to compare all the results.

算法加速比比较1

算法加速比比较3

算法时间比较1

算法时间比较3

About

Parallel Transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published