Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help] 请问如何能做到微调过程中不保存早期的checkpoint #654

Open
1 task done
ybdesire opened this issue Jan 15, 2024 · 3 comments
Open
1 task done

Comments

@ybdesire
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

比如,微调模型的配置如下

    --max_steps 3000 \
    --save_steps 5\

这样保存的checkpoint就会从5, 10, 15, 20, ..., 3000。这样就保存太多checkpoint了。

我想跳过step小于2000的部分,就是只保存checkpoint从 2000, 2005, 2010, ..., 3000。请问应该如何配置呢?

Expected Behavior

No response

Steps To Reproduce

    --max_steps 3000 \
    --save_steps 5\

Environment

OS: Ubuntu 20.04
Python: 3.8
Transformers: 4.26.1
PyTorch: 1.12
CUDA Support: True

Anything else?

No response

@hhy150
Copy link

hhy150 commented Jan 20, 2024

(1)可以先训练一个2000的,设置
--max_steps 2000
--save_steps 2000
(2)然后在上面继续训练,设置
--max_steps 3000
--save_steps 5

@ybdesire
Copy link
Author

(1)可以先训练一个2000的,设置 --max_steps 2000 --save_steps 2000 (2)然后在上面继续训练,设置 --max_steps 3000 --save_steps 5

感谢回复,这也是个思路。
请问有没有能直接一次训练就能做到的方法?因为有些平台上提交训练没法中断后再接着训练这样操作

@hhy150
Copy link

hhy150 commented Jan 21, 2024

(1)可以先训练一个2000的,设置 --max_steps 2000 --save_steps 2000 (2)然后在上面继续训练,设置 --max_steps 3000 --save_steps 5

感谢回复,这也是个思路。 请问有没有能直接一次训练就能做到的方法?因为有些平台上提交训练没法中断后再接着训练这样操作

这个我就不太知道了,抱歉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants