-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
单机8张3090,运行后exits with return code = -9,错误码信息在那里可以查询? #38
Comments
同样有这个疑问 |
同学你最后怎么解决的? |
不能算解决..我当时那台机器NVIDIA 显卡驱动盒 cuda 版本是 10.x..但是我的 conda 里面的 cuda 是 116 的.我怀疑是这个问题..因为这台机器环境太复杂,不敢升级显卡驱动.. 我后来没办法.又弄了一台 8 卡 3090 的机器, cuda 版本是 11.6..我把 pytorch 的 cuda 也安装了 116..倒是不出这个问题了..但是出了其他问题.我也是醉了..想做个测试真的太难了..有条件直接 A100 吧..3090 是真的折腾 |
还有你往回看一下具体的报错信息..缺各种文件..要想办法补齐..反正肯定还是环境问题...要不嫌麻烦就重装系统驱动啥的全部重新来一遍,cuda 尽量 116/117 和那个 pytorch 的 cuda 版本对应上 |
可能是因为内存不足 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
单机8张3090,运行后exits with return code = -9,错误码信息在那里可以查询?
The text was updated successfully, but these errors were encountered: