Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce error: Non-existant physical address #82

Open
jieWANGforwork opened this issue Aug 15, 2021 · 0 comments
Open

Reproduce error: Non-existant physical address #82

jieWANGforwork opened this issue Aug 15, 2021 · 0 comments

Comments

@jieWANGforwork
Copy link

Hi!

I met this error when reproducing VQA task, could you please have a loook and give me some suggestion based on your experience? Thanks a lot!

0%| | 0/6000 [00:00<?, ?it/s][1,0]:08/15/2021 10:08:00 - INFO - main - ***** Running training with 4 GPUs *****
[1,0]:08/15/2021 10:08:00 - INFO - main - Num examples = 471128
[1,0]:08/15/2021 10:08:00 - INFO - main - Batch size = 1024
[1,0]:08/15/2021 10:08:00 - INFO - main - Accumulate steps = 5
[1,0]:08/15/2021 10:08:00 - INFO - main - Num steps = 6000
[1,0]:[1a62e574072d:00334] *** Process received signal ***
[1,0]:[1a62e574072d:00334] Signal: Bus error (7)
[1,0]:[1a62e574072d:00334] Signal code: Non-existant physical address (2)
[1,0]:[1a62e574072d:00334] Failing at address: 0x7f246888f00a
[1,0]:[1a62e574072d:00334] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f25b870a390]
[1,0]:[1a62e574072d:00334] [ 1] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x128e0)[0x7f25aa5988e0]
[1,0]:[1a62e574072d:00334] [ 2] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x12b74)[0x7f25aa598b74]
[1,0]:[1a62e574072d:00334] [ 3] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x14ba5)[0x7f25aa59aba5]
[1,0]:[1a62e574072d:00334] [ 4] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(mdb_get+0xbc)[0x7f25aa59b40c]
[1,0]:[1a62e574072d:00334] [ 5] [1,0]:/opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x9d9d)[0x7f25aa58fd9d]
[1,0]:[1a62e574072d:00334] [ 6] python(_PyCFunction_FastCallDict+0x154)[0x55f2e08c1744]
[1,0]:[1a62e574072d:00334] [ 7] [1,0]:python(+0x19842c)[0x55f2e094842c]
[1,0]:[1a62e574072d:00334] [ 8] python(_PyEval_EvalFrameDefault+0x30a)[0x55f2e096d38a]
[1,0]:[1a62e574072d:00334] [ 9] [1,0]:python(_PyFunction_FastCallDict+0x11b)[0x55f2e0942bab]
[1,0]:[1a62e574072d:00334] [10] python(_PyObject_FastCallDict+0x26f)[0x55f2e08c1b0f]
[1,0]:[1a62e574072d:00334] [11] [1,0]:python(_PyObject_Call_Prepend+0x63)[0x55f2e08c66a3]
[1,0]:[1a62e574072d:00334] [12] python(PyObject_Call+0x3e)[0x55f2e08c154e]
[1,0]:[1a62e574072d:00334] [13] [1,0]:python(+0x16b50a)[0x55f2e091b50a]
[1,0]:[1a62e574072d:00334] [14] python(_PyEval_EvalFrameDefault+0x877)[0x55f2e096d8f7]
[1,0]:[1a62e574072d:00334] [15] [1,0]:python(_PyFunction_FastCallDict+0x11b)[0x55f2e0942bab]
[1,0]:[1a62e574072d:00334] [16] python(_PyObject_FastCallDict+0x26f)[0x55f2e08c1b0f]
[1,0]:[1a62e574072d:00334] [17] python(_PyObject_Call_Prepend+0x63)[0x55f2e08c66a3]
[1,0]:[1a62e574072d:00334] [18] [1,0]:python(PyObject_Call+0x3e)[0x55f2e08c154e]
[1,0]:[1a62e574072d:00334] [19] python(+0x16b50a)[0x55f2e091b50a]
[1,0]:[1a62e574072d:00334] [1,0]:[20] python(_PyEval_EvalFrameDefault+0x877)[0x55f2e096d8f7]
[1,0]:[1a62e574072d:00334] [21] [1,0]:python(+0x19253b)[0x55f2e094253b]
[1,0]:[1a62e574072d:00334] [22] python(+0x198505)[0x55f2e0948505]
[1,0]:[1a62e574072d:00334] [23] [1,0]:python(_PyEval_EvalFrameDefault+0x30a)[0x55f2e096d38a]
[1,0]:[1a62e574072d:00334] [24] python(+0x191a76)[0x55f2e0941a76]
[1,0]:[1a62e574072d:00334] [25] python(_PyFunction_FastCallDict+0x1bc)[0x55f2e0942c4c]
[1,0]:[1a62e574072d:00334] [26] [1,0]:python(_PyObject_FastCallDict+0x26f)[0x55f2e08c1b0f]
[1,0]:[1a62e574072d:00334] [27] python(_PyObject_Call_Prepend+0x63)[0x55f2e08c66a3]
[1,0]:[1a62e574072d:00334] [28] [1,0]:python(PyObject_Call+0x3e)[0x55f2e08c154e]
[1,0]:[1a62e574072d:00334] [29] python(+0x16b50a)[0x55f2e091b50a]
[1,0]:[1a62e574072d:00334] *** End of error message ***
[1,2]:[1a62e574072d:00336] *** Process received signal ***
[1,2]:[1a62e574072d:00336] Signal: Bus error (7)
[1,2]:[1a62e574072d:00336] Signal code: Non-existant physical address (2)
[1,2]:[1a62e574072d:00336] Failing at address: 0x7f56e824f00a
[1,3]:[1a62e574072d:00337] *** Process received signal ***
[1,3]:[1a62e574072d:00337] Signal: Bus error (7)
[1,3]:[1a62e574072d:00337] Signal code: Non-existant physical address (2)
[1,3]:[1a62e574072d:00337] Failing at address: 0x7fd82888f00a
[1,3]:[1a62e574072d:00337] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fd96b06d390]
[1,3]:[1a62e574072d:00337] [ 1] [1,3]:/opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x128e0)[0x7fd955ee38e0]
[1,3]:[1a62e574072d:00337] [ 2] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x12b74)[0x7fd955ee3b74]
[1,3]:[1a62e574072d:00337] [ 3] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x14ba5)[0x7fd955ee5ba5]
[1,3]:[1a62e574072d:00337] [ 4] [1,3]:/opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(mdb_get+0xbc)[0x7fd955ee640c]
[1,3]:[1a62e574072d:00337] [ 5] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x9d9d)[0x7fd955edad9d]
[1,3]:[1a62e574072d:00337] [ 6] [1,2]:[1a62e574072d:00336] [ 0] [1,3]:python(_PyCFunction_FastCallDict+0x154)[0x5600587e3744]
[1,3]:[1a62e574072d:00337] [ 7] python(+0x19842c)[0x56005886a42c]
[1,3]:[1a62e574072d:00337] [ 8] [1,2]:/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f582a139390]
[1,2]:[1a62e574072d:00336] [ 1] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x128e0)[0x7f58106a88e0]
[1,2]:[1a62e574072d:00336] [ 2] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x12b74)[0x7f58106a8b74]
[1,2]:[1a62e574072d:00336] [ 3] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x14ba5)[0x7f58106aaba5]
[1,2]:[1a62e574072d:00336] [ 4] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(mdb_get+0xbc)[0x7f58106ab40c]
[1,2]:[1a62e574072d:00336] [ 5] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x9d9d)[0x7f581069fd9d]
[1,2]:[1a62e574072d:00336] [ 6] [1,3]:python(_PyEval_EvalFrameDefault+0x30a)[0x56005888f38a]
[1,3]:[1a62e574072d:00337] [ 9] python(_PyFunction_FastCallDict+0x11b)[0x560058864bab]
[1,3]:[1a62e574072d:00337] [10] [1,2]:python(_PyCFunction_FastCallDict+0x154)[0x558e13464744]
[1,2]:[1a62e574072d:00336] [ 7] python(+0x19842c)[0x558e134eb42c]
[1,2]:[1a62e574072d:00336] [ 8] [1,3]:python(_PyObject_FastCallDict+0x26f)[0x5600587e3b0f]
[1,3]:[1a62e574072d:00337] [11] [1,2]:python(_PyEval_EvalFrameDefault+0x30a)[0x558e1351038a]
[1,2]:[1a62e574072d:00336] [ 9] python(_PyFunction_FastCallDict+0x11b)[0x558e134e5bab]
[1,2]:[1a62e574072d:00336] [10] [1,3]:python(_PyObject_Call_Prepend+0x63)[0x5600587e86a3]
[1,3]:[1a62e574072d:00337] [12] python(PyObject_Call+0x3e)[0x5600587e354e]
[1,3]:[1a62e574072d:00337] [13] [1,2]:python(_PyObject_FastCallDict+0x26f)[0x558e13464b0f]
[1,2]:[1a62e574072d:00336] [11] [1,3]:python(+0x16b50a)[0x56005883d50a]
[1,3]:[1a62e574072d:00337] [14] [1,2]:python(_PyObject_Call_Prepend+0x63)[0x558e134696a3]
[1,2]:[1a62e574072d:00336] [12] [1,3]:python(_PyEval_EvalFrameDefault+0x877)[0x56005888f8f7]
[1,3]:[1a62e574072d:00337] [15] [1,2]:python(PyObject_Call+0x3e)[0x558e1346454e]
[1,2]:[1a62e574072d:00336] [13] [1,3]:python(_PyFunction_FastCallDict+0x11b)[0x560058864bab]
[1,3]:[1a62e574072d:00337] [16] [1,2]:python(+0x16b50a)[0x558e134be50a]
[1,2]:[1a62e574072d:00336] [14] [1,3]:python(_PyObject_FastCallDict+0x26f)[0x5600587e3b0f]
[1,3]:[1a62e574072d:00337] [17] python(_PyObject_Call_Prepend+0x63)[0x5600587e86a3]
[1,3]:[1a62e574072d:00337] [18] [1,2]:python(_PyEval_EvalFrameDefault+0x877)[0x558e135108f7]
[1,2]:[1a62e574072d:00336] [15] python(_PyFunction_FastCallDict+0x11b)[0x558e134e5bab]
[1,3]:python(PyObject_Call+0x3e)[0x5600587e354e]
[1,3]:[1a62e574072d:00337] [19] [1,2]:[1a62e574072d:00336] [16] python(_PyObject_FastCallDict+0x26f)[0x558e13464b0f]
[1,2]:[1a62e574072d:00336] [17] [1,3]:python(+0x16b50a)[0x56005883d50a]
[1,3]:[1a62e574072d:00337] [20] [1,2]:python(_PyObject_Call_Prepend+0x63)[0x558e134696a3]
[1,2]:[1a62e574072d:00336] [18] [1,3]:python(_PyEval_EvalFrameDefault+0x877)[0x56005888f8f7]
[1,3]:[1a62e574072d:00337] [21] [1,2]:python(PyObject_Call+0x3e)[0x558e1346454e]
[1,2]:[1a62e574072d:00336] [19] [1,3]:python(+0x19253b)[0x56005886453b]
[1,3]:[1a62e574072d:00337] [22] python(+0x198505)[0x56005886a505]
[1,3]:[1a62e574072d:00337] [23] [1,2]:python(+0x16b50a)[0x558e134be50a]
[1,2]:[1a62e574072d:00336] [20] python(_PyEval_EvalFrameDefault+0x877)[0x558e135108f7]
[1,2]:[1a62e574072d:00336] [21] [1,3]:python(_PyEval_EvalFrameDefault+0x30a)[0x56005888f38a]
[1,3]:[1a62e574072d:00337] [24] python(+0x191a76)[0x560058863a76]
[1,3]:[1a62e574072d:00337] [25] [1,2]:python(+0x19253b)[0x558e134e553b]
[1,2]:[1a62e574072d:00336] [22] [1,3]:python(_PyFunction_FastCallDict+0x1bc)[0x560058864c4c]
[1,3]:[1a62e574072d:00337] [26] [1,2]:python(+0x198505)[0x558e134eb505]
[1,2]:[1a62e574072d:00336] [23] [1,3]:python(_PyObject_FastCallDict+0x26f)[0x5600587e3b0f]
[1,3]:[1a62e574072d:00337] [27] [1,2]:python(_PyEval_EvalFrameDefault+0x30a)[0x558e1351038a]
[1,2]:[1a62e574072d:00336] [24] python(+0x191a76)[0x558e134e4a76]
[1,3]:python(_PyObject_Call_Prepend+0x63)[0x5600587e86a3]
[1,3]:[1a62e574072d:00337] [28] [1,2]:[1a62e574072d:00336] [25] python(_PyFunction_FastCallDict+0x1bc)[0x558e134e5c4c]
[1,2]:[1a62e574072d:00336] [26] [1,3]:python(PyObject_Call+0x3e)[0x5600587e354e]
[1,3]:[1a62e574072d:00337] [29] python(+0x16b50a)[0x56005883d50a]
[1,2]:python(_PyObject_FastCallDict+0x26f)[0x558e13464b0f]
[1,2]:[1a62e574072d:00336] [27] python(_PyObject_Call_Prepend+0x63)[0x558e134696a3]
[1,2]:[1a62e574072d:00336] [28] [1,3]:[1a62e574072d:00337] *** End of error message ***
[1,2]:python(PyObject_Call+0x3e)[0x558e1346454e]
[1,2]:[1a62e574072d:00336] [29] [1,2]:python(+0x16b50a)[0x558e134be50a]
[1,2]:[1a62e574072d:00336] *** End of error message ***
[1,1]:[1a62e574072d:00335] *** Process received signal ***
[1,1]:[1a62e574072d:00335] Signal: Bus error (7)
[1,1]:[1a62e574072d:00335] Signal code: Non-existant physical address (2)
[1,1]:[1a62e574072d:00335] Failing at address: 0x7fd41730e00a
[1,1]:[1a62e574072d:00335] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fd5591ad390]
[1,1]:[1a62e574072d:00335] [ 1] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x128e0)[0x7fd44f81f8e0]
[1,1]:[1a62e574072d:00335] [1,1]:[ 2] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x12b74)[0x7fd44f81fb74]
[1,1]:[1a62e574072d:00335] [ 3] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x14ba5)[0x7fd44f821ba5]
[1,1]:[1a62e574072d:00335] [ 4] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(mdb_get+0xbc)[0x7fd44f82240c]
[1,1]:[1a62e574072d:00335] [ 5] /opt/conda/lib/python3.6/site-packages/lmdb/cpython.cpython-36m-x86_64-linux-gnu.so(+0x9d9d)[0x7fd44f816d9d]
[1,1]:[1a62e574072d:00335] [ 6] [1,1]:python(_PyCFunction_FastCallDict+0x154)[0x55e2b8f02744]
[1,1]:[1a62e574072d:00335] [ 7] [1,1]:python(+0x19842c)[0x55e2b8f8942c]
[1,1]:[1a62e574072d:00335] [ 8] [1,1]:python(_PyEval_EvalFrameDefault+0x30a)[0x55e2b8fae38a]
[1,1]:[1a62e574072d:00335] [ 9] [1,1]:python(_PyFunction_FastCallDict+0x11b)[0x55e2b8f83bab]
[1,1]:[1a62e574072d:00335] [10] [1,1]:python(_PyObject_FastCallDict+0x26f)[0x55e2b8f02b0f]
[1,1]:[1a62e574072d:00335] [11] [1,1]:python(_PyObject_Call_Prepend+0x63)[0x55e2b8f076a3]
[1,1]:[1a62e574072d:00335] [12] [1,1]:python(PyObject_Call+0x3e)[0x55e2b8f0254e]
[1,1]:[1a62e574072d:00335] [13] [1,1]:python(+0x16b50a)[0x55e2b8f5c50a]
[1,1]:[1a62e574072d:00335] [14] [1,1]:python(_PyEval_EvalFrameDefault+0x877)[0x55e2b8fae8f7]
[1,1]:[1a62e574072d:00335] [15] [1,1]:python(_PyFunction_FastCallDict+0x11b)[0x55e2b8f83bab]
[1,1]:[1a62e574072d:00335] [16] [1,1]:python(_PyObject_FastCallDict+0x26f)[0x55e2b8f02b0f]
[1,1]:[1a62e574072d:00335] [17] [1,1]:python(_PyObject_Call_Prepend+0x63)[0x55e2b8f076a3]
[1,1]:[1a62e574072d:00335] [18] [1,1]:python(PyObject_Call+0x3e)[0x55e2b8f0254e]
[1,1]:[1a62e574072d:00335] [19] [1,1]:python(+0x16b50a)[0x55e2b8f5c50a]
[1,1]:[1a62e574072d:00335] [20] [1,1]:python(_PyEval_EvalFrameDefault+0x877)[0x55e2b8fae8f7]
[1,1]:[1a62e574072d:00335] [21] [1,1]:python(+0x19253b)[0x55e2b8f8353b]
[1,1]:[1a62e574072d:00335] [22] [1,1]:python(+0x198505)[0x55e2b8f89505]
[1,1]:[1a62e574072d:00335] [23] [1,1]:python(_PyEval_EvalFrameDefault+0x30a)[0x55e2b8fae38a]
[1,1]:[1a62e574072d:00335] [24] [1,1]:python(+0x191a76)[0x55e2b8f82a76]
[1,1]:[1a62e574072d:00335] [25] [1,1]:python(_PyFunction_FastCallDict+0x1bc)[0x55e2b8f83c4c]
[1,1]:[1a62e574072d:00335] [26] [1,1]:python(_PyObject_FastCallDict+0x26f)[0x55e2b8f02b0f]
[1,1]:[1a62e574072d:00335] [27] [1,1]:python(_PyObject_Call_Prepend+0x63)[0x55e2b8f076a3]
[1,1]:[1a62e574072d:00335] [28] [1,1]:python(PyObject_Call+0x3e)[0x55e2b8f0254e]
[1,1]:[1a62e574072d:00335] [29] [1,1]:python(+0x16b50a)[0x55e2b8f5c50a]
[1,1]:[1a62e574072d:00335] *** End of error message ***

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 0 with PID 0 on node 1a62e574072d exited on signal 7 (Bus error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant