Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to full fine tuning llama3 70B with torchtune on a machine with 8*A100 80G ? #987

Open
jaywongs opened this issue May 16, 2024 · 12 comments

Comments

@jaywongs
Copy link

I'm not sure if it's possible. The document didn't mention that, perhaps?

@RdoubleA
Copy link
Contributor

Hi @jaywongs, you are more than well equipped to run any of our recipes with that configuration :) our documentation mentions some common hardware that users may run recipes with, anything beefier than what's mentioned should certainly be good

@jaywongs
Copy link
Author

Hi @jaywongs, you are more than well equipped to run any of our recipes with that configuration :) our documentation mentions some common hardware that users may run recipes with, anything beefier than what's mentioned should certainly be good

Thank for your quick reply!
I apologize for the confusion. What I actually meant was a full fine-tuning, but not specifically Lora or QLora.

@ebsmothers
Copy link
Contributor

@jaywongs yep you should be able to without any problems. Just give

tune run --nproc_per_node 4 full_finetune_distributed --config llama3/8B_full

a try (after downloading the model). Or if you want you can go all the way to --nproc_per_node 8 to get larger batch size(/faster training).

@kartikayk
Copy link
Contributor

kartikayk commented May 16, 2024

@jaywongs If you're talking about 70B, then this should also be possible. I use the following config (with the PagedAdamW optimizer from bitsandbytes) with the Llama2 70B model for full finetuning which takes ~51GB of memory on 8 80GB A100s. So this should be able to handle the larger vocab size with Llama3 70B as well. You'll need to make some updates including pointing to the right checkpoint files and dir etc. Let me know if this works for you.

Link: https://gist.github.com/kartikayk/a9bea0cca013a0f75e8859ad3250013c

@rohan-varma
Copy link
Member

@kartikayk Is there a config linked? Don't think we can use PagedAdamW with FSDP unfortunately

@kartikayk
Copy link
Contributor

kartikayk commented May 17, 2024

@rohan-varma yeh I copied the config in the gist above. I've been training 70B full with this setting for Llama2 (screenshot attached)

Screenshot 2024-05-16 at 6 56 02 PM

 

Don't think we can use PagedAdamW with FSDP unfortunately

I dont think this is true? BnB had a recent release which supports FSDP, though 8-bit is WIP. I think this should extend to Llama3 as well since it takes ~52GB of memory.

@jaywongs
Copy link
Author

Thank you so much, i'll give it a try!

@jaywongs If you're talking about 70B, then this should also be possible. I use the following config (with the PagedAdamW optimizer from bitsandbytes) with the Llama2 70B model for full finetuning which takes ~51GB of memory on 8 80GB A100s. So this should be able to handle the larger vocab size with Llama3 70B as well. You'll need to make some updates including pointing to the right checkpoint files and dir etc. Let me know if this works for you.

Link: https://gist.github.com/kartikayk/a9bea0cca013a0f75e8859ad3250013c

@rohan-varma
Copy link
Member

rohan-varma commented May 17, 2024

@rohan-varma yeh I copied the config in the gist above. I've been training 70B full with this setting for Llama2 (screenshot attached)

Screenshot 2024-05-16 at 6 56 02 PM  

Don't think we can use PagedAdamW with FSDP unfortunately

I dont think this is true? BnB had a recent release which supports FSDP, though 8-bit is WIP. I think this should extend to Llama3 as well since it takes ~52GB of memory.

Oh ok, I think perhaps the initial version of the comment didn't have the gist linked.

Curious if you've tried optimizer state save / load with FSDP + bnb optimizers? Last @ebsmothers and I checked, this was failing, which is why the bnb optimizers weren't shipped in any of the llama3 recipes, even though they helped memory. This might be a situation where it's failing for the 8 bit but passes for PagedOptimizers because the latter doesn't introduce any additional state.

@jaywongs jaywongs changed the title Is it possible to fine tuning llama3 with torchtune on a machine with 8*A100 80G ? Is it possible to full fine tuning llama3 70B with torchtune on a machine with 8*A100 80G ? May 17, 2024
@ebsmothers
Copy link
Contributor

Curious if you've tried optimizer state save / load with FSDP + bnb optimizers? Last @ebsmothers and I checked, this was failing, which is why the bnb optimizers weren't shipped in any of the llama3 recipes, even though they helped memory. This might be a situation where it's failing for the 8 bit but passes for PagedOptimizers because the latter doesn't introduce any additional state.

Yeah @kartikayk can correct me if I'm wrong, but I believe he was actually able to save the checkpoint properly with PagedAdamW + FSDP

@rohan-varma
Copy link
Member

That sounds good. If we do ship such a recipe, we should be careful to document that the 8 bit low precision optimizers aren't supported, because it'll be pretty easy for users to swap about PagedAdamW for PagedAdamW8bit and not realize checkpointing will break. This is of course still possible in today's set up, so perhaps we should just hard error for now when FSDP + low precision optimizer is detected.

@rohan-varma
Copy link
Member

@kartikayk @ebsmothers Have you been able to train llama3-70b on 8xA100 without CPU offload? Even with PagedOptimizer, I'm getting an OOM in the backward pass so think we need CPU offload for llama3-70b at least.

@ebsmothers
Copy link
Contributor

@rohan-varma I haven't run it myself so would defer to your experience here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants