Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在训练的每一轮结束后释放显存缓冲区 #2044

Open
xderui opened this issue May 6, 2024 · 0 comments
Open

在训练的每一轮结束后释放显存缓冲区 #2044

xderui opened this issue May 6, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@xderui
Copy link

xderui commented May 6, 2024

描述这个 bug
在对验证集进行验证时所使用的显存缓冲区没有被释放,导致在下一轮训练时可能会出现显存超出并训练变慢的情况。

如何复现
python .\run_recbole.py --dataset gowalla-merged --model GRU4RecCPR

预期
在每一轮结束后执行torch.cuda.empty_cache()清空显存

屏幕截图
未清空显存缓存:
image
清空显存缓存:
image
添加代码:
image

实验环境(请补全下列信息):

  • OS: Windows
  • RecBole: 1.2.0
  • Python: 3.10.13
  • PyTorch: 2.0.1+cu117
@xderui xderui added the bug Something isn't working label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants