Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project下的MAE多卡训练报错 #508

Open
stonewjf opened this issue May 23, 2023 · 0 comments
Open

Project下的MAE多卡训练报错 #508

stonewjf opened this issue May 23, 2023 · 0 comments

Comments

@stonewjf
Copy link

环境
python 3.8
cuda11.7
oneflow 安装命令 pip install --pre oneflow -f https://staging.oneflow.info/branch/master/cu117

使用命令bash tools/train.sh projects/MAE/train_net.py projects/MAE/configs/mae_pretraining.py 8 使用9ka训练MAE模型,出现以下错误
`[05/23 19:19:57 lb.engine.default]: >>> done with dataset index builder. Compilation time: 0.372 seconds
[05/23 19:19:57 lb.engine.default]: >>> done with compiling. Compilation time: 0.386 seconds
[05/23 19:19:57 lb.engine.default]: Prepare training, validating, testing set
libi40iw-i40iw_ucreate_cq: failed to create CQ
libi40iw-i40iw_ucreate_cq: failed to create CQ
libi40iw-i40iw_ucreate_cq: failed to create CQ
libi40iw-i40iw_ucreate_cq: failed to create CQ
libi40iw-i40iw_ucreate_cq: failed to create CQ
libi40iw-i40iw_ucreate_cq: failed to create CQ
libi40iw-i40iw_ucreate_cq: failed to create CQ
F20230523 19:20:03.780495 441071 ibverbs_comm_network.cpp:136] Check failed: cq_ : Invalid argument [22]
F20230523 19:20:03.780494 441068 ibverbs_comm_network.cpp:136] Check failed: cq_ : Invalid argument [22]
libi40iw-i40iw_ucreate_cq: failed to create CQ
F20230523 19:20:03.780660 441066 ibverbs_comm_network.cpp:136] Check failed: cq_ : Invalid argument [22]
F20230523 19:20:03.780586 441067 ibverbs_comm_network.cpp:136] Check failed: cq_ : Invalid argument [22]
F20230523 19:20:03.780730 441070 ibverbs_comm_network.cpp:136] Check failed: cq_ : Invalid argument [22]
F20230523 19:20:03.780752 441072 ibverbs_comm_network.cpp:136] Check failed: cq_ : Invalid argument [22]
F20230523 19:20:03.780771 441069 ibverbs_comm_network.cpp:136] Check failed: cq_ : Invalid argument [22]
F20230523 19:20:03.780956 441073 ibverbs_comm_network.cpp:136] Check failed: cq_ : Invalid argument [22]
*** Check failure stack trace: ***
*** Check failure stack trace: ***
*** Check failure stack trace: ***
*** Check failure stack trace: ***
*** Check failure stack trace: ***
*** Check failure stack trace: ***
*** Check failure stack trace: ***
*** Check failure stack trace: ***
@ 0x2af4452cce9a google::LogMessage::Fail()
@ 0x2ace0e495e9a google::LogMessage::Fail()
@ 0x2acdec915e9a google::LogMessage::Fail()
@ 0x2b123f062e9a google::LogMessage::Fail()
@ 0x2b7ed8370e9a google::LogMessage::Fail()
@ 0x2af4452cfbd1 google::LogMessage::SendToLog()
@ 0x2af50e270e9a google::LogMessage::Fail()
@ 0x2ace0e498bd1 google::LogMessage::SendToLog()
@ 0x2b7ed8373bd1 google::LogMessage::SendToLog()
@ 0x2b123f065bd1 google::LogMessage::SendToLog()
@ 0x2b9923993e9a google::LogMessage::Fail()
@ 0x2af4452cc998 google::LogMessage::Flush()
@ 0x2b5caa658e9a google::LogMessage::Fail()
@ 0x2ace0e495998 google::LogMessage::Flush()
@ 0x2b7ed8370998 google::LogMessage::Flush()
@ 0x2b123f062998 google::LogMessage::Flush()
@ 0x2af4452cde42 google::ErrnoLogMessage::~ErrnoLogMessage()
@ 0x2b123f063e42 google::ErrnoLogMessage::~ErrnoLogMessage()
@ 0x2ace0e496e42 google::ErrnoLogMessage::~ErrnoLogMessage()
@ 0x2b7ed8371e42 google::ErrnoLogMessage::~ErrnoLogMessage()
@ 0x2acdec918bd1 google::LogMessage::SendToLog()
@ 0x2af50e273bd1 google::LogMessage::SendToLog()
@ 0x2b123192c63a oneflow::IBVerbsCommNet::IBVerbsCommNet()
@ 0x2ace00d5f63a oneflow::IBVerbsCommNet::IBVerbsCommNet()
@ 0x2b7ecac3a63a oneflow::IBVerbsCommNet::IBVerbsCommNet()
@ 0x2af437b9663a oneflow::IBVerbsCommNet::IBVerbsCommNet()
@ 0x2b9923996bd1 google::LogMessage::SendToLog()
@ 0x2af50e270998 google::LogMessage::Flush()
@ 0x2acdec915998 google::LogMessage::Flush()
@ 0x2b5caa65bbd1 google::LogMessage::SendToLog()
@ 0x2ace066fa9f0 oneflow::InitRDMA()
@ 0x2b12372c79f0 oneflow::InitRDMA()
@ 0x2b7ed05d59f0 oneflow::InitRDMA()
@ 0x2af43d5319f0 oneflow::InitRDMA()
@ 0x2acdec916e42 google::ErrnoLogMessage::~ErrnoLogMessage()
@ 0x2af50e271e42 google::ErrnoLogMessage::~ErrnoLogMessage()
@ 0x2b5caa658998 google::LogMessage::Flush()
@ 0x2af500b3a63a oneflow::IBVerbsCommNet::IBVerbsCommNet()
@ 0x2acddf1df63a oneflow::IBVerbsCommNet::IBVerbsCommNet()
@ 0x2b9923993998 google::LogMessage::Flush()
@ 0x2af5064d59f0 oneflow::InitRDMA()
@ 0x2acde4b7a9f0 oneflow::InitRDMA()
@ 0x2b9923994e42 google::ErrnoLogMessage::~ErrnoLogMessage()
@ 0x2b5caa659e42 google::ErrnoLogMessage::~ErrnoLogMessage()
@ 0x2b991625d63a oneflow::IBVerbsCommNet::IBVerbsCommNet()
@ 0x2b5c9cf2263a oneflow::IBVerbsCommNet::IBVerbsCommNet()
@ 0x2b991bbf89f0 oneflow::InitRDMA()
@ 0x2b5ca28bd9f0 oneflow::InitRDMA()
@ 0x2b118b0e86d6 (unknown)
@ 0x2acd5a51b6d6 (unknown)
@ 0x2af3913526d6 (unknown)
@ 0x2acd3899b6d6 (unknown)
@ 0x2b5bf66de6d6 (unknown)
@ 0x2b986fa196d6 (unknown)
@ 0x2b7e243f66d6 (unknown)
@ 0x2af45a2f66d6 (unknown)
@ 0x2b5bf6800f96 (unknown)
@ 0x2af45a418f96 (unknown)
@ 0x2af391474f96 (unknown)
@ 0x2acd5a63df96 (unknown)
@ 0x2b7e24518f96 (unknown)
@ 0x2acd38abdf96 (unknown)
@ 0x2b118b20af96 (unknown)
@ 0x2b986fb3bf96 (unknown)
@ 0x55b706e7c052 PyCFunction_Call
@ 0x5611fb060052 PyCFunction_Call
@ 0x561700afb052 PyCFunction_Call
@ 0x55aaa8d1c052 PyCFunction_Call
@ 0x555ec4486052 PyCFunction_Call
@ 0x55d95affc052 PyCFunction_Call
@ 0x55636ff8e052 PyCFunction_Call
@ 0x55a39b6ad052 PyCFunction_Call
@ 0x55636ff7930b _PyObject_MakeTpCall
@ 0x55b706e6730b _PyObject_MakeTpCall
@ 0x561700ae630b _PyObject_MakeTpCall
@ 0x555ec447130b _PyObject_MakeTpCall
@ 0x55a39b69830b _PyObject_MakeTpCall
@ 0x55aaa8d0730b _PyObject_MakeTpCall
@ 0x55d95afe730b _PyObject_MakeTpCall
@ 0x5611fb04b30b _PyObject_MakeTpCall
@ 0x55636ff75341 _PyEval_EvalFrameDefault
@ 0x561700ae2341 _PyEval_EvalFrameDefault
@ 0x555ec446d341 _PyEval_EvalFrameDefault
@ 0x55aaa8d03341 _PyEval_EvalFrameDefault
@ 0x55b706e63341 _PyEval_EvalFrameDefault
@ 0x55d95afe3341 _PyEval_EvalFrameDefault
@ 0x55a39b694341 _PyEval_EvalFrameDefault
@ 0x561700aed8a6 _PyFunction_Vectorcall
@ 0x555ec44788a6 _PyFunction_Vectorcall
@ 0x55636ff808a6 _PyFunction_Vectorcall
@ 0x55a39b69f8a6 _PyFunction_Vectorcall
@ 0x55aaa8d0e8a6 _PyFunction_Vectorcall
@ 0x55d95afee8a6 _PyFunction_Vectorcall
@ 0x5611fb047341 _PyEval_EvalFrameDefault
@ 0x55b706e6e8a6 _PyFunction_Vectorcall
@ 0x555ec446cefb _PyEval_EvalFrameDefault
@ 0x561700ae1efb _PyEval_EvalFrameDefault
@ 0x55636ff74efb _PyEval_EvalFrameDefault
@ 0x55a39b693efb _PyEval_EvalFrameDefault
@ 0x55aaa8d02efb _PyEval_EvalFrameDefault
@ 0x55d95afe2efb _PyEval_EvalFrameDefault
@ 0x561700adc2f1 _PyEval_EvalCodeWithName
@ 0x555ec44672f1 _PyEval_EvalCodeWithName
@ 0x55636ff6f2f1 _PyEval_EvalCodeWithName
@ 0x55d95afdd2f1 _PyEval_EvalCodeWithName
@ 0x55a39b68e2f1 _PyEval_EvalCodeWithName
@ 0x55aaa8cfd2f1 _PyEval_EvalCodeWithName
@ 0x5611fb0528a6 _PyFunction_Vectorcall
@ 0x55b706e62efb _PyEval_EvalFrameDefault
@ 0x555ec4470a2b _PyObject_FastCallDict
@ 0x55636ff78a2b _PyObject_FastCallDict
@ 0x561700ae5a2b _PyObject_FastCallDict
@ 0x55d95afe6a2b _PyObject_FastCallDict
@ 0x55aaa8d06a2b _PyObject_FastCallDict
@ 0x55a39b697a2b _PyObject_FastCallDict
@ 0x55636ff8a4af slot_tp_init
@ 0x555ec44824af slot_tp_init
@ 0x55aaa8d184af slot_tp_init
@ 0x55d95aff84af slot_tp_init
@ 0x5611fb046efb _PyEval_EvalFrameDefault
@ 0x55a39b6a94af slot_tp_init
@ 0x555ec4471324 _PyObject_MakeTpCall
@ 0x55636ff79324 _PyObject_MakeTpCall
@ 0x561700af74af slot_tp_init
@ 0x55636ff74dd7 _PyEval_EvalFrameDefault
@ 0x55b706e5d2f1 _PyEval_EvalCodeWithName
@ 0x5611fb0412f1 _PyEval_EvalCodeWithName
@ 0x55a39b698324 _PyObject_MakeTpCall
@ 0x55636ff808a6 _PyFunction_Vectorcall
@ 0x561700ae6324 _PyObject_MakeTpCall
@ 0x55d95afe7324 _PyObject_MakeTpCall
@ 0x55aaa8d07324 _PyObject_MakeTpCall
@ 0x55b706e66a2b _PyObject_FastCallDict
@ 0x55636ff70729 _PyEval_EvalFrameDefault
@ 0x55d95afe2dd7 _PyEval_EvalFrameDefault
@ 0x5611fb04aa2b _PyObject_FastCallDict
@ 0x55aaa8d02dd7 _PyEval_EvalFrameDefault
@ 0x555ec446cdd7 _PyEval_EvalFrameDefault
@ 0x55636ff6f2f1 _PyEval_EvalCodeWithName
@ 0x55d95afee8a6 _PyFunction_Vectorcall
@ 0x561700ae1dd7 _PyEval_EvalFrameDefault
@ 0x556370021e99 PyEval_EvalCodeEx
@ 0x55d95afde729 _PyEval_EvalFrameDefault
@ 0x55aaa8d0e8a6 _PyFunction_Vectorcall
@ 0x555ec44788a6 _PyFunction_Vectorcall
@ 0x55a39b693dd7 _PyEval_EvalFrameDefault
@ 0x55b706e784af slot_tp_init
@ 0x556370021e5b PyEval_EvalCode
@ 0x55d95afdd2f1 _PyEval_EvalCodeWithName
@ 0x55a39b69f8a6 _PyFunction_Vectorcall
@ 0x55b706e67324 _PyObject_MakeTpCall
@ 0x55aaa8cfe729 _PyEval_EvalFrameDefault
@ 0x561700aed8a6 _PyFunction_Vectorcall
@ 0x5611fb05c4af slot_tp_init
@ 0x5563700427f9 run_eval_code_obj
@ 0x55d95b08fe99 PyEval_EvalCodeEx
@ 0x55b706e62dd7 _PyEval_EvalFrameDefault
@ 0x55aaa8cfd2f1 _PyEval_EvalCodeWithName
@ 0x5563700417f3 run_mod
@ 0x561700add729 _PyEval_EvalFrameDefault
@ 0x55d95b08fe5b PyEval_EvalCode
@ 0x55a39b68f729 _PyEval_EvalFrameDefault
@ 0x5611fb04b324 _PyObject_MakeTpCall
@ 0x55aaa8dafe99 PyEval_EvalCodeEx
@ 0x55636fef0f73 pyrun_file
@ 0x55d95b0b07f9 run_eval_code_obj
@ 0x561700adc2f1 _PyEval_EvalCodeWithName
@ 0x55b706e6e8a6 _PyFunction_Vectorcall
@ 0x5611fb046dd7 _PyEval_EvalFrameDefault
@ 0x55aaa8dafe5b PyEval_EvalCode
@ 0x5611fb0528a6 _PyFunction_Vectorcall
@ 0x55a39b68e2f1 _PyEval_EvalCodeWithName
@ 0x561700b8ee99 PyEval_EvalCodeEx
@ 0x55d95b0af7f3 run_mod
@ 0x555ec4468729 _PyEval_EvalFrameDefault
@ 0x55636fef0a77 PyRun_SimpleFileExFlags
@ 0x55a39b740e99 PyEval_EvalCodeEx
@ 0x55d95af5ef73 pyrun_file
@ 0x555ec44672f1 _PyEval_EvalCodeWithName
@ 0x55636fee3fdd Py_RunMain.cold
@ 0x5611fb042729 _PyEval_EvalFrameDefault
@ 0x561700b8ee5b PyEval_EvalCode
@ 0x55aaa8dd07f9 run_eval_code_obj
@ 0x55d95af5ea77 PyRun_SimpleFileExFlags
@ 0x555ec4519e99 PyEval_EvalCodeEx
@ 0x556370015679 Py_BytesMain
@ 0x2b9857274c05 __libc_start_main
@ 0x55b706e5e729 _PyEval_EvalFrameDefault
@ 0x561700baf7f9 run_eval_code_obj
@ 0x5611fb0412f1 _PyEval_EvalCodeWithName
@ 0x55a39b740e5b PyEval_EvalCode
@ 0x55d95af51fdd Py_RunMain.cold
@ 0x55aaa8dcf7f3 run_mod
@ 0x555ec4519e5b PyEval_EvalCode
@ 0x55b706e5d2f1 _PyEval_EvalCodeWithName
@ 0x5611fb0f3e99 PyEval_EvalCodeEx
@ 0x55a39b7617f9 run_eval_code_obj
@ 0x561700bae7f3 run_mod
@ 0x55d95b083679 Py_BytesMain
@ 0x555ec453a7f9 run_eval_code_obj
@ 0x2b7e0bc51c05 __libc_start_main
@ 0x55b706f0fe99 PyEval_EvalCodeEx
@ 0x55aaa8c7ef73 pyrun_file
@ 0x5611fb0f3e5b PyEval_EvalCode
@ 0x55637001557d (unknown)
@ 0x55d95b08357d (unknown)
@ 0x55b706f0fe5b PyEval_EvalCode
@ 0x555ec45397f3 run_mod
@ 0x55aaa8c7ea77 PyRun_SimpleFileExFlags
@ 0x55a39b7607f3 run_mod
@ 0x55b706f307f9 run_eval_code_obj
@ 0x561700a5df73 pyrun_file
@ 0x555ec43e8f73 pyrun_file
@ 0x55b706f2f7f3 run_mod
@ 0x55a39b60ff73 pyrun_file
@ 0x55aaa8c71fdd Py_RunMain.cold
@ 0x561700a5da77 PyRun_SimpleFileExFlags
@ 0x55b706ddef73 pyrun_file
@ 0x55a39b60fa77 PyRun_SimpleFileExFlags
@ 0x5611fb1147f9 run_eval_code_obj
@ 0x561700a50fdd Py_RunMain.cold
@ 0x55aaa8da3679 Py_BytesMain
@ 0x2af441b51c05 __libc_start_main
@ 0x55b706ddea77 PyRun_SimpleFileExFlags
@ 0x5611fb1137f3 run_mod
@ 0x561700b82679 Py_BytesMain
@ 0x2b1172943c05 __libc_start_main
@ 0x55a39b602fdd Py_RunMain.cold
@ 0x55b706dd1fdd Py_RunMain.cold
@ 0x5611fafc2f73 pyrun_file
@ 0x55aaa8da357d (unknown)
@ 0x555ec43e8a77 PyRun_SimpleFileExFlags
@ 0x55b706f03679 Py_BytesMain
@ 0x2acd201f6c05 __libc_start_main
@ 0x5611fafc2a77 PyRun_SimpleFileExFlags
@ 0x55a39b734679 Py_BytesMain
@ 0x2b5bddf39c05 __libc_start_main
@ 0x5611fafb5fdd Py_RunMain.cold
@ 0x55b706f0357d (unknown)
@ 0x5611fb0e7679 Py_BytesMain
@ 0x2acd41d76c05 __libc_start_main
@ 0x55a39b73457d (unknown)
@ 0x561700b8257d (unknown)
@ 0x555ec43dbfdd Py_RunMain.cold
@ 0x5611fb0e757d (unknown)
@ 0x555ec450d679 Py_BytesMain
@ 0x2af378badc05 __libc_start_main
@ 0x555ec450d57d (unknown)
Stack trace (most recent call last)Stack trace (most recent call last)Stack trace (most recent call last)Stack trace (most recent call last)Stack trace (most recent call last)Stack trace (most recent call last)Stack trace (most recent call last)Stack trace (most recent call last):
:
:
:
:
:
:
:
Object " Object " Object " Object " Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so Object " Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so", at ", at ", at ", at ", at ", at 0x2b986fb3bf950x2af391474f950x2b5bf6800f95, in
, in
, in 0x2acd5a63df95
Object " Object ", in Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so
", at ", at ", at 0x2b5bf66de6d5 Object "0x2b986fa196d5, in /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so0x2af3913526d5
", at , in , in 0x2acd5a51b6d5
, in

0x2af45a418f950x2b7e24518f95, in , in
0x2b118b20af95
, in
Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so0x2acd38abdf95", at Object " Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so0x2af45a2f66d5/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so, in , in ", at ", at

0x2b118b0e86d50x2b7e243f66d5, in
, in
Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/_oneflow_internal.cpython-38-x86_64-linux-gnu.so", at 0x2acd3899b6d5, in
Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at 0x2b991bbf89ef Object " Object ", in /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.soInitRDMA() Object " Object "", at Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so
Object " Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so0x2b12372c79ef/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at , in Object "0x2b5ca28bd9ef", at 0x2ace066fa9ef", at InitRDMA()0x2acde4b7a9ef/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so, in ", at , in InitRDMA()0x2b7ed05d59ef
, in 0x2af43d5319efInitRDMA()", at 0x2af5064d59ef
, in InitRDMA(), in Object "0x2b991625d639, in InitRDMA()
InitRDMA()/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so Object "
InitRDMA(), in

/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at
Object "IBVerbsCommNet::IBVerbsCommNet()0x2b123192c639 Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at Object ", in
Object "", at 0x2ace00d5f639/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so Object "IBVerbsCommNet::IBVerbsCommNet()/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so, in 0x2b5c9cf22639 Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at
", at ", at , in /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.soIBVerbsCommNet::IBVerbsCommNet()0x2acddf1df6390x2b7ecac3a639", at Object "IBVerbsCommNet::IBVerbsCommNet()0x2af437b96639
, in ", at 0x2af500b3a639, in
/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.soIBVerbsCommNet::IBVerbsCommNet(), in Object ", in 0x2b9923994e41IBVerbsCommNet::IBVerbsCommNet()
", at Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.soIBVerbsCommNet::IBVerbsCommNet(), in IBVerbsCommNet::IBVerbsCommNet()
/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so0x2b123f063e41 Object "
", at

, in Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at 0x2ace0e496e41 Object "
Object "", at Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so, in 0x2b5caa659e41/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at Object "", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so, in
/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so0x2b9923993997", at ", at
0x2b7ed8371e41", at , in , in Object "0x2af50e271e410x2af4452cde41

0x2b123f0629970x2acdec916e41 Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so, in , in , in Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so Object "", at , in

", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so0x2ace0e495997/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so
Object "", at , in 0x2b5caa658997 Object "", at Object " Object "0x2b9923996bd0
/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so0x2b7ed8370997, in /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so, in /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at ", at Object "
, in
", at 0x2b123f065bd0", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so
0x2af4452cc997 Object "0x2acdec915997, in 0x2af50e270997 Object "", at Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so, in
, in , in 0x2ace0e498bd0/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at
Object "

", at , in 0x2b5caa65bbd0", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so Object "0x2b9923993e99 Object ", in Object "
", at 0x2b7ed8373bd0, in Object "
/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so, in Object "0x2b123f062e99/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so
", at ", at ", at
0x2acdec918bd0, in , in /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so0x2af50e273bd0 Object "
", at Object "
, in /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so0x2b5caa658e990x2ace0e495e99 Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so
Object ", in , in ", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at Object "

", at 0x2b9915341ef20x2b7ed8370e99, in ", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so0x2acdec915e990x2b1230a10ef2 Object ", in
Object ", in /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so, in

", at
", at 0x2b5c9c006ef2 Object "0x2acdffe43ef2, in /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so Object "
, in ", at /home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so
0x2acdde2c3ef2", at
, in ", at Aborted (Signal sent by tkill() 441066 1006)

0x2b7ec9d1eef2, in 0x2af4452cfbd0
Aborted (Signal sent by tkill() 441069 1006)

Aborted (Signal sent by tkill() 441072 1006)
", at
0x2af50e270e99, in
, in

Object "
Aborted (Signal sent by tkill() 441068 1006)
Object "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at 0x2af4452cce99, in
", at Aborted (Signal sent by tkill() 441071 1006)
Aborted (Signal sent by tkill() 441070 1006)
0x2af4ffc1eef2, in Object "
/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/../oneflow.libs/liboneflow-7f11bd0f.so", at
0x2af436c7aef2, in

Aborted (Signal sent by tkill() 441073 1006)
Aborted (Signal sent by tkill() 441067 1006)
Killing subprocess 441066
Killing subprocess 441067
Killing subprocess 441068
Killing subprocess 441069
Killing subprocess 441070
Killing subprocess 441071
Killing subprocess 441072
Killing subprocess 441073
Traceback (most recent call last):
File "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/distributed/launch.py", line 240, in
main()
File "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/distributed/launch.py", line 228, in main
sigkill_handler(signal.SIGTERM, None)
File "/home/haida_huanglei/anaconda3/envs/libai3/lib/python3.8/site-packages/oneflow/distributed/launch.py", line 196, in sigkill_handler
raise subprocess.CalledProcessError(
subprocess.CalledProcessError: Command '['/home/haida_huanglei/anaconda3/envs/libai3/bin/python3', '-u', 'projects/MAE/train_net.py', '--config-file', 'projects/MAE/configs/mae_pretraining.py']' died with <Signals.SIGABRT: 6>.`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant