-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model collapse #11
Comments
Might be relevant to my observations as well. See #4 |
Thanks a lot. |
Hi, thank you for your interest in our work! Did it happen a lot over a wide range of queries after training or just a small number? In our experiments, we also observed that CLLMs' performance is highly related to both the pre-trained model and Jacobi trajectories' quality. So data pre-processing before training to clean up token-level and sentence-level repetition is needed. For some small subset of queries, generation quality might not be as good because of both Jacobi trajectories' quality and reasons mentioned in #4. At the time our math CLLM is trained, Abel-7B-001 is the latest model. Now there is a more recent math-solving model released by the same team and it's significantly stronger: https://huggingface.co/GAIR/Abel-7B-002. I think it's worthy trying. |
Greatly thanks for your response. It t happens a lot over a wide range of queries. I would finish the whole training procedure to see whether it would be better. |
If it happens a lot over a wide range of queries then something is off. After the training completes, you can test its performance with our evaluation script on GSM8K and compare with our checkpoints to see whether their performance are comparable. Regarding the hyper-parameters, we will double check and get back to you. |
Thanks a lot for your reply! I will keep trying. |
I think I may have found the reason. The "use_gt_labels" in the file |
Yes, thanks for pointing it out! It should be set to False to match the training recipe for GSM8K released checkpoints. |
Hi!
Dear authors,
Thanks for your excellent work!
I train the CLLM model on GSM8k with Abel-7B-001 as the teacher model, using the dataset
cleaned_gsm8k_jacobi
dataset you provided on huggingface, and run the train_cllm.sh with n_token_seq_size=16.Now the training process has been completed 1/5, but the checkpoint does not look good. It seems that the model has collapsed to output the same tokens, as shown in the following picture.
Is this phenomenon normal? Did I use the training scripts correctly?
I would greatly appreciate it if you could help me.
Best regards.
The text was updated successfully, but these errors were encountered: