Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do the published training weights "7b_tiva_v0" include all three stages of training results simultaneously? #62

Open
pengxuan001 opened this issue Nov 1, 2023 · 4 comments

Comments

@pengxuan001
Copy link

From the training code, the training for each stage will be saved separately in a file, do the published training weights "7b_tiva_v0" include all three stages of training results simultaneously? In addition, in the inference code, input projection layer、output projection layer and lora file with LLM seem to be initialized and not loaded from the existing model "7b_tiva_v0".

@ChocoWu
Copy link
Collaborator

ChocoWu commented Nov 2, 2023

Hi, the released checkpoint includes all training parameters across all three stages.
During the inference stage, we indeed load the pre-trained parameters from "7b_tiva_v0" after model initialization.
Please refer to the code snippet below:

model = NextGPTModel(**args)
delta_ckpt = torch.load(os.path.join(args['nextgpt_ckpt_path'], f'pytorch_model.pt'), map_location=torch.device('cpu'))

NExT-GPT/code/inference.py

Lines 110 to 111 in e2e2f94

model = NextGPTModel(**args)
delta_ckpt = torch.load(os.path.join(args['nextgpt_ckpt_path'], 'pytorch_model.pt'), map_location=torch.device('cuda'))

@pengxuan001
Copy link
Author

pengxuan001 commented Nov 2, 2023

Hi, the released checkpoint includes all training parameters across all three stages. During the inference stage, we indeed load the pre-trained parameters from "7b_tiva_v0" after model initialization. Please refer to the code snippet below:

model = NextGPTModel(**args)
delta_ckpt = torch.load(os.path.join(args['nextgpt_ckpt_path'], f'pytorch_model.pt'), map_location=torch.device('cpu'))

NExT-GPT/code/inference.py

Lines 110 to 111 in e2e2f94

model = NextGPTModel(**args)
delta_ckpt = torch.load(os.path.join(args['nextgpt_ckpt_path'], 'pytorch_model.pt'), map_location=torch.device('cuda'))

@ChocoWu Thank you for your reply. I have another question. How do I save the training results of the three stages in a weight file when I train myself? Can we directly specify the training results of each stage as the same file, such as "7b_tiva_v0"? Will the results of each stage of training be merged or covered?

It seems that the results of the first stage training were not used during the second stage training, and the results of the first and second stages of training were also not used during the third stage training.

@ChocoWu
Copy link
Collaborator

ChocoWu commented Nov 2, 2023

@pengxuan001, actually, the results of the previous stage training are used during the next stage of training:

self.load_parameters(self.args['save_path'], self.args['stage'])

If you want to separately save the weights trained in different stages, you need to specify a different save path, --save_path. Otherwise, the results will be covered.

@jwzhi
Copy link

jwzhi commented Feb 6, 2024

Is there any suggestions on how to load 7b_tiva_v0 during training stage? I tried to continue instruction training on my own data starting from the provided 7b_tiva_v0 checkpoints. However, simply setting save_path= 7b_tiva_v0 does not work. The load checkpoint function during training time seems to always load the vicuna weights instead of nextgpt weights. Thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants