min-max-gpt : minGPT that scales

So you've looked at minGPT. Now its time to scale.

This codebase provides..

muP-initialization & learning rate settings, with tweakable gpt code.
mixed precision trianing, FSDP with deepspeed zero-3
ZERO-HUGGINGFACE : If you are interested in not-relying-on-huggingface for training, this is it. This doesn't use accelerate, transformer, etc, so you have maximal control. (still uses default_data_collator and get_scheduler, but its really easy to get rid of)
Good Deepspeed codebase.

Usage

Install:

git clone https://github.com/cloneofsimo/min-max-gpt
cd min-max-gpt
pip install -r requirements.txt

Single Node training:

ROOT_DIR=/path/to/dir/min-max-gpt
cd $ROOT_DIR
export WORLD_SIZE=$(nvidia-smi -L | wc -l)
deepspeed --num_gpus $WORLD_SIZE run_trainer.py --learning_rate 1e-4 --head_width 32 --run_name "test"

Multi-node training:

You should create hostfile and specify environemnt variables. Nodes must be able to access each other with passwordless SSH.

$ cat /path/to/dir/min-max-gpt/hostfile
xxx.xxx.xxx.xxx slots=8 #node0 ip, #num_gpus
xxx.xxx.xxx.xxx slots=8 #node1 ip, #num_gpus

ROOT_DIR=/path/to/dir/min-max-gpt
cd $ROOT_DIR
export HOST_FILE_PATH=$ROOT_DIR/hostfile
export NGPU_PER_NODE=?? # e.g. 8
export NUM_NODES=?? # e.g. 2
export SINGLE_RUN=True
./run_multi_node.sh $HOST_FILE_PATH $NGPU_PER_NODE $NUM_NODES $SINGLE_RUN

Current Test Status:

Tested on 8x 80GB A100 GPUs, can train 20B models.

todo :

multi-node training
mu-scaling plot
test on llama

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md
memory_profile_utils.py		memory_profile_utils.py
mmgpt.png		mmgpt.png
requirements.txt		requirements.txt
run_multi_node.sh		run_multi_node.sh
run_trainer.py		run_trainer.py
tweakablegpt.py		tweakablegpt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

memory_profile_utils.py

memory_profile_utils.py

mmgpt.png

mmgpt.png

requirements.txt

requirements.txt

run_multi_node.sh

run_multi_node.sh

run_trainer.py

run_trainer.py

tweakablegpt.py

tweakablegpt.py

Repository files navigation

min-max-gpt : minGPT that scales

Usage

Current Test Status:

About

Releases

Packages

Contributors 3

Languages

cloneofsimo/min-max-gpt

Folders and files

Latest commit

History

Repository files navigation

min-max-gpt : minGPT that scales

Usage

Current Test Status:

About

Resources

Stars

Watchers

Forks

Languages