[Help] 请问如何能做到微调过程中不保存早期的checkpoint #654

ybdesire · 2024-01-15T03:52:32Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

比如，微调模型的配置如下

    --max_steps 3000 \
    --save_steps 5\

这样保存的checkpoint就会从5, 10, 15, 20, ..., 3000。这样就保存太多checkpoint了。

我想跳过step小于2000的部分，就是只保存checkpoint从 2000, 2005, 2010, ..., 3000。请问应该如何配置呢？

Expected Behavior

No response

Steps To Reproduce

    --max_steps 3000 \
    --save_steps 5\

Environment

OS: Ubuntu 20.04
Python: 3.8
Transformers: 4.26.1
PyTorch: 1.12
CUDA Support: True

Anything else?

No response

The text was updated successfully, but these errors were encountered:

hhy150 · 2024-01-20T13:30:19Z

（1）可以先训练一个2000的，设置
--max_steps 2000
--save_steps 2000
（2）然后在上面继续训练，设置
--max_steps 3000
--save_steps 5

ybdesire · 2024-01-21T02:33:37Z

（1）可以先训练一个2000的，设置 --max_steps 2000 --save_steps 2000 （2）然后在上面继续训练，设置 --max_steps 3000 --save_steps 5

感谢回复，这也是个思路。
请问有没有能直接一次训练就能做到的方法？因为有些平台上提交训练没法中断后再接着训练这样操作

hhy150 · 2024-01-21T06:22:08Z

（1）可以先训练一个2000的，设置 --max_steps 2000 --save_steps 2000 （2）然后在上面继续训练，设置 --max_steps 3000 --save_steps 5

感谢回复，这也是个思路。请问有没有能直接一次训练就能做到的方法？因为有些平台上提交训练没法中断后再接着训练这样操作

这个我就不太知道了，抱歉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help] 请问如何能做到微调过程中不保存早期的checkpoint #654

[Help] 请问如何能做到微调过程中不保存早期的checkpoint #654

ybdesire commented Jan 15, 2024

hhy150 commented Jan 20, 2024

ybdesire commented Jan 21, 2024

hhy150 commented Jan 21, 2024

[Help] 请问如何能做到微调过程中不保存早期的checkpoint #654

[Help] 请问如何能做到微调过程中不保存早期的checkpoint #654

Comments

ybdesire commented Jan 15, 2024

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

hhy150 commented Jan 20, 2024

ybdesire commented Jan 21, 2024

hhy150 commented Jan 21, 2024