Model collapse #11

TrueNobility303 · 2024-05-22T08:03:27Z

Hi!

Dear authors,

Thanks for your excellent work!
I train the CLLM model on GSM8k with Abel-7B-001 as the teacher model, using the dataset
cleaned_gsm8k_jacobi dataset you provided on huggingface, and run the train_cllm.sh with n_token_seq_size=16.
Now the training process has been completed 1/5, but the checkpoint does not look good. It seems that the model has collapsed to output the same tokens, as shown in the following picture.

Is this phenomenon normal? Did I use the training scripts correctly?

I would greatly appreciate it if you could help me.

Best regards.

The text was updated successfully, but these errors were encountered:

w32zhong · 2024-05-24T17:42:32Z

Might be relevant to my observations as well. See #4

TrueNobility303 · 2024-05-25T06:00:33Z

Might be relevant to my observations as well. See #4

Thanks a lot.

snyhlxde1 · 2024-05-26T07:55:21Z

Hi, thank you for your interest in our work! Did it happen a lot over a wide range of queries after training or just a small number?

In our experiments, we also observed that CLLMs' performance is highly related to both the pre-trained model and Jacobi trajectories' quality. So data pre-processing before training to clean up token-level and sentence-level repetition is needed. For some small subset of queries, generation quality might not be as good because of both Jacobi trajectories' quality and reasons mentioned in #4.

At the time our math CLLM is trained, Abel-7B-001 is the latest model. Now there is a more recent math-solving model released by the same team and it's significantly stronger: https://huggingface.co/GAIR/Abel-7B-002. I think it's worthy trying.

TrueNobility303 · 2024-05-26T08:38:43Z

Hi, thank you for your interest in our work! Did it happen a lot over a wide range of queries after training or just a small number?

In our experiments, we also observed that CLLMs' performance is highly related to both the pre-trained model and Jacobi trajectories' quality. So data pre-processing before training to clean up token-level and sentence-level repetition is needed. For some small subset of queries, generation quality might not be as good because of both Jacobi trajectories' quality and reasons mentioned in #4.

At the time our math CLLM is trained, Abel-7B-001 is the latest model. Now there is a more recent math-solving model released by the same team and it's significantly stronger: https://huggingface.co/GAIR/Abel-7B-002. I think it's worthy trying.

Greatly thanks for your response. It t happens a lot over a wide range of queries. I would finish the whole training procedure to see whether it would be better.
I use the default setting in the code for training one epochs. Is it the right training time?
Also, I wonder whether the checkpoint cllm/consistency-llm-7b-math is trained with the default hyper-parameters?

snyhlxde1 · 2024-05-26T10:27:03Z

If it happens a lot over a wide range of queries then something is off. After the training completes, you can test its performance with our evaluation script on GSM8K and compare with our checkpoints to see whether their performance are comparable.

Regarding the hyper-parameters, we will double check and get back to you.

TrueNobility303 · 2024-05-26T15:38:45Z

Thanks a lot for your reply! I will keep trying.

TrueNobility303 · 2024-05-29T13:17:43Z

I think I may have found the reason. The "use_gt_labels" in the file train_cllm_global.py should be set to False, instead of the default setting of True. After this modification, the training becomes good.

snyhlxde1 · 2024-05-29T21:46:04Z

Yes, thanks for pointing it out! It should be set to False to match the training recipe for GSM8K released checkpoints.

TrueNobility303 changed the title ~~Model collaphse~~ Model collapse May 22, 2024

snyhlxde1 closed this as completed Jun 1, 2024

TrueNobility303 mentioned this issue Jun 10, 2024

Only has 0.44 accuracy on GSM8K after running the provided codes #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model collapse #11

Model collapse #11

TrueNobility303 commented May 22, 2024 •

edited

w32zhong commented May 24, 2024

TrueNobility303 commented May 25, 2024

snyhlxde1 commented May 26, 2024

TrueNobility303 commented May 26, 2024

snyhlxde1 commented May 26, 2024

TrueNobility303 commented May 26, 2024

TrueNobility303 commented May 29, 2024 •

edited

snyhlxde1 commented May 29, 2024

Model collapse #11

Model collapse #11

Comments

TrueNobility303 commented May 22, 2024 • edited

w32zhong commented May 24, 2024

TrueNobility303 commented May 25, 2024

snyhlxde1 commented May 26, 2024

TrueNobility303 commented May 26, 2024

snyhlxde1 commented May 26, 2024

TrueNobility303 commented May 26, 2024

TrueNobility303 commented May 29, 2024 • edited

snyhlxde1 commented May 29, 2024

TrueNobility303 commented May 22, 2024 •

edited

TrueNobility303 commented May 29, 2024 •

edited