Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

单机8张3090,运行后exits with return code = -9,错误码信息在那里可以查询? #38

Open
equationdz opened this issue Apr 8, 2023 · 5 comments

Comments

@equationdz
Copy link

单机8张3090,运行后exits with return code = -9,错误码信息在那里可以查询?

@baketbek
Copy link

同样有这个疑问

@baketbek
Copy link

单机8张3090,运行后exits with return code = -9,错误码信息在那里可以查询?

同学你最后怎么解决的?

@equationdz
Copy link
Author

不能算解决..我当时那台机器NVIDIA 显卡驱动盒 cuda 版本是 10.x..但是我的 conda 里面的 cuda 是 116 的.我怀疑是这个问题..因为这台机器环境太复杂,不敢升级显卡驱动..

我后来没办法.又弄了一台 8 卡 3090 的机器, cuda 版本是 11.6..我把 pytorch 的 cuda 也安装了 116..倒是不出这个问题了..但是出了其他问题.我也是醉了..想做个测试真的太难了..有条件直接 A100 吧..3090 是真的折腾

@equationdz
Copy link
Author

还有你往回看一下具体的报错信息..缺各种文件..要想办法补齐..反正肯定还是环境问题...要不嫌麻烦就重装系统驱动啥的全部重新来一遍,cuda 尽量 116/117 和那个 pytorch 的 cuda 版本对应上

@s1ghhh
Copy link

s1ghhh commented Jul 11, 2023

可能是因为内存不足

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants