This project is a simple implementation of the paper Attention Is All You Need using Pytorch. Some of the code here are inspired by The Annotated Transformer implementation.
This project's model follows the exact model architecture in the paper.
I created a simple German-to-English Translator using this model to demonstrate how can we train and test this model.