Does more steps of pretraining lead to better encoder for downstream tasks? #42

dr-GitHub-account · 2023-05-24T12:37:07Z

Thank you for your contributions in pretraining. You trained the encoder for 12.5K steps for each domain in pretraining phase before applying the encoder to supervised downstream tasks. Is it possible that the checkpoints that are most suitable for downstream tasks might appear in the middle of the pretraining phase? This phenomenon is obvious in many real applications. Under the circumstances, we might not know if it is a good choice to directly train the model to the maximum step with all of the corpus and take the final checkpoint. Is there any suggestion on that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does more steps of pretraining lead to better encoder for downstream tasks? #42

Does more steps of pretraining lead to better encoder for downstream tasks? #42

dr-GitHub-account commented May 24, 2023

Does more steps of pretraining lead to better encoder for downstream tasks? #42

Does more steps of pretraining lead to better encoder for downstream tasks? #42

Comments

dr-GitHub-account commented May 24, 2023