Transformer implementation from scratch written on Py-Torch. Optimized for CUDA runtime, designed to integrate seamlessly with Azure ML workspaces
Modularized version. Currently clocked at 85.04M parameters Automatic training pipeline configured for Azure ML.