Skip to content

Latest commit

 

History

History
190 lines (140 loc) · 5.4 KB

setup.md

File metadata and controls

190 lines (140 loc) · 5.4 KB

Setup tutorial

"Simplicity is prerequisite for reliability."

— Edsger W. Dijkstra

Initialize the project

$ higgsfield init my_llama_project
It creates a folder named my_llama_project with the following structure:
my_llama_project
├── src 
│   ├── __init__.py
│   ├── experiment.py
│   └── config.py
├── Dockerfile
├── env
├── requirements.txt
└── README.md

Setup the environment

Get into the project folder:

$ cd my_llama_project

Then start editing the env file. It should contain the valid SSH key to your training nodes. Make sure the key exists under the given path in your machine. For example:

$ cat env
SSH_KEY=~/.ssh/id_rsa

Great! Now you should edit the src/config.py file. It contains your experiments' configuration.

Example
import os

NAME = "my_llama_project"

# You should fill this place with your training nodes IPs
HOSTS = [
    "1.2.3.4", 
]

# The user name of your training nodes, 
# It should be the same for all nodes.
# And it might be different than 'ubuntu'.
HOSTS_USER = "ubuntu" 

# The port of your training nodes, same for all nodes.
HOSTS_PORT = 22

# Number of processes per node. Depends on the amount of GPUs you have on each node.
NUM_PROCESSES = 4

# You can list other environment variables here.
WAN_DB_TOKEN = os.environ.get("WAN_DB_TOKEN", None)

You should fill those fields with your own configuration.

Setup git

You should create a new git repository in Github. Make sure you won't create any README, .gitignore or LICENSE files.

Just an empty repository.

Alt text

Then follow the first option in the Github page to push an existing repository from your terminal.

Details screen.

Alt text

Time to setup your nodes!

Now you should setup your nodes. You can do it running:

$ higgsfield setup-nodes

Which will install all the required tools on your nodes. You might need some patience here, don't worry, it's a one time process. Like this:

$ higgsfield setup-nodes
INSTALLING DOCKER
...
INSTALLING INVOKER
...
SETTING UP DEPLOY KEY
...
PULLING DOCKER IMAGE
But if you're stuck...

But if you're stuck for some reason on this step, because you haven't added your git origin, then you should try to toggle between SSH | HTTPS options on top of Github page. Then try to run the git remote add origin command again. If it's not because of that, then you should try to properly setup your SSH key in env file along with the config file in src/config.py.

Run your very first experiment

You're very close to run your first experiment. Take a look at the src/experiment.py.

@experiment("llama")
@param("size", options=["70b", "13b", "7b"])
def train_llama(params):
    print(f"Training llama with size {params.size}")
    ...

That's exactly the way you will be defining experiments further on. No need for hydra, argparse or any other boilerplate code. Just define your experiment, then run the following command:

$ higgsfield build-experiments

Notice anything new? It's a new folder named .github/workflows with the following structure:

.github
└── workflows
    ├── run_llama.yml
    └── deploy.yml
Curious about them? These files were exactly intended to be your entrypoint to the simplified deploy of your experiments. Now you can just push your code to Github, and it will automatically deploy the code on your nodes. Not only that, it will also allow you to run your training experiments and save the checkpoints!

Fasten your seatbelt, it's time to deploy!

You should add your SSH_KEY contents into Github secrets. To achieve that you should go to your Github repository page, then click on Settings tab, then Secrets tab, then New repository secret button. Then add your SSH_KEY contents as a secret with the name SSH_KEY.

Like this.

Alt text Alt text

And add your deploy key into deploy keys. You can get it by running the following command:

$ higgsfield show-deploy-key
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5A000THERESHOULDBEYOURDEPLOYKEYHEREso83os//

Copy the output and add it. You can name it DEPLOY_KEY.

Like this.

Alt text

Push your code:

git add .
git commit -m "Make the way for the LLAMA!"
git push origin main

Now you should go to the Actions tab in your Github repository page. You should see something like this:

Alt text

As soon as it turns green (which means it's done), you can go to the left side and run the run_llama.yml workflow, put any name you like, and click Run workflow button: Alt text

Run is running... Alt text Alt text

And finished running! Experiment is training on your nodes! Alt text