Multi-Head Self-Attention Transformer Implementation

This project is a simple implementation of the paper Attention Is All You Need using Pytorch. Some of the code here are inspired by The Annotated Transformer implementation.

Model Architecture

This project's model follows the exact model architecture in the paper.

I created a simple German-to-English Translator using this model to demonstrate how can we train and test this model.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
models.py		models.py
utils.py		utils.py