Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model collapse #11

Closed
TrueNobility303 opened this issue May 22, 2024 · 8 comments
Closed

Model collapse #11

TrueNobility303 opened this issue May 22, 2024 · 8 comments

Comments

@TrueNobility303
Copy link

TrueNobility303 commented May 22, 2024

Hi!

Dear authors,

Thanks for your excellent work!
I train the CLLM model on GSM8k with Abel-7B-001 as the teacher model, using the dataset
cleaned_gsm8k_jacobi dataset you provided on huggingface, and run the train_cllm.sh with n_token_seq_size=16.
Now the training process has been completed 1/5, but the checkpoint does not look good. It seems that the model has collapsed to output the same tokens, as shown in the following picture.

image

Is this phenomenon normal? Did I use the training scripts correctly?

I would greatly appreciate it if you could help me.

Best regards.

@TrueNobility303 TrueNobility303 changed the title Model collaphse Model collapse May 22, 2024
@w32zhong
Copy link

Might be relevant to my observations as well. See #4

@TrueNobility303
Copy link
Author

Might be relevant to my observations as well. See #4

Thanks a lot.

@snyhlxde1
Copy link
Collaborator

Hi, thank you for your interest in our work! Did it happen a lot over a wide range of queries after training or just a small number?

In our experiments, we also observed that CLLMs' performance is highly related to both the pre-trained model and Jacobi trajectories' quality. So data pre-processing before training to clean up token-level and sentence-level repetition is needed. For some small subset of queries, generation quality might not be as good because of both Jacobi trajectories' quality and reasons mentioned in #4.

At the time our math CLLM is trained, Abel-7B-001 is the latest model. Now there is a more recent math-solving model released by the same team and it's significantly stronger: https://huggingface.co/GAIR/Abel-7B-002. I think it's worthy trying.

@TrueNobility303
Copy link
Author

Hi, thank you for your interest in our work! Did it happen a lot over a wide range of queries after training or just a small number?

In our experiments, we also observed that CLLMs' performance is highly related to both the pre-trained model and Jacobi trajectories' quality. So data pre-processing before training to clean up token-level and sentence-level repetition is needed. For some small subset of queries, generation quality might not be as good because of both Jacobi trajectories' quality and reasons mentioned in #4.

At the time our math CLLM is trained, Abel-7B-001 is the latest model. Now there is a more recent math-solving model released by the same team and it's significantly stronger: https://huggingface.co/GAIR/Abel-7B-002. I think it's worthy trying.

Greatly thanks for your response. It t happens a lot over a wide range of queries. I would finish the whole training procedure to see whether it would be better.
I use the default setting in the code for training one epochs. Is it the right training time?
Also, I wonder whether the checkpoint cllm/consistency-llm-7b-math is trained with the default hyper-parameters?

@snyhlxde1
Copy link
Collaborator

If it happens a lot over a wide range of queries then something is off. After the training completes, you can test its performance with our evaluation script on GSM8K and compare with our checkpoints to see whether their performance are comparable.

Regarding the hyper-parameters, we will double check and get back to you.

@TrueNobility303
Copy link
Author

Thanks a lot for your reply! I will keep trying.

@TrueNobility303
Copy link
Author

TrueNobility303 commented May 29, 2024

I think I may have found the reason. The "use_gt_labels" in the file train_cllm_global.py should be set to False, instead of the default setting of True. After this modification, the training becomes good.

@snyhlxde1
Copy link
Collaborator

Yes, thanks for pointing it out! It should be set to False to match the training recipe for GSM8K released checkpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants