Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime Error: Segmentation fault #10

Open
TriLoo opened this issue Apr 16, 2020 · 5 comments
Open

Runtime Error: Segmentation fault #10

TriLoo opened this issue Apr 16, 2020 · 5 comments

Comments

@TriLoo
Copy link

TriLoo commented Apr 16, 2020

I use my own image datas with demo.py and got this error, no any other infos display.

I have located the postion causing this error, its roi layer calling. However, I tested the ROIAlign_cuda.cu not using PyTorch Tensor as parameters but use float * instead and no errors raise.

my gcc version is 4.8.5, is the gcc version critical ? any advices? thanks

@xyang35
Copy link
Contributor

xyang35 commented Apr 17, 2020

Thanks for your interest in our work! How many GPUs are you using for your job? Have you tried using 1 GPU?

@TriLoo
Copy link
Author

TriLoo commented Apr 17, 2020

My server contains 8 P40 GPUs. I tried just using one GPU (cuda:0) and same error happened.

I used

os.environ['CUDA_VISIBLE_DEVICE']="0"

# OR

torch.cuda.set_device(0)

to use cuda:0 only.

Also, I manually set the gpu_count to 0.

The top several calling stacks stored in core file is shown as below:

#0  0x00007f6df6eb13ac in construct<_object*, _object*> (__p=0xb, this=0x7f6e4ab5c318) at /usr/include/c++/4.8.2/ext/new_allocator.h:120
#1  _S_construct<_object*, _object*> (__p=0xb, __a=...) at /usr/include/c++/4.8.2/bits/alloc_traits.h:254
#2  construct<_object*, _object*> (__p=0xb, __a=...) at /usr/include/c++/4.8.2/bits/alloc_traits.h:393
#3  emplace_back<_object*> (this=0x7f6e4ab5c318) at /usr/include/c++/4.8.2/bits/vector.tcc:96
#4  push_back (__x=<unknown type in /search/odin/songminghui/githubs/STEP/external/maskrcnn_benchmark/roi_layers/_C.cpython-36m-x86_64-linux-gnu.so, CU 0x0, DIE 0x12877a>, this=0x7f6e4ab5c318) at /usr/include/c++/4.8.2/bits/stl_vector.h:920
#5  loader_life_support (this=0x7ffd998f01f0) at /search/odin/songminghui/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/cast.h:44
#6  pybind11::cpp_function::dispatcher (self=<optimized out>, args_in=0x7f6deee3be28, kwargs_in=0x0) at /search/odin/songminghui/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pybind11.h:618

@quanh1990
Copy link

@TriLoo Hello, I met the same problem, I used one TeslaP100 GPU with 16G memory, did you solve the problem? Looking forward to your reply!

@TriLoo
Copy link
Author

TriLoo commented Apr 27, 2020

Sorry, not yet ...
It may be caused by the pytorch version, gcc version, but I am not sure.

By the way, my pytorch version is 1.0.2

@quanh1990
Copy link

Thanks for your interest in our work! How many GPUs are you using for your job? Have you tried using 1 GPU?

@xyang35 Could you please provide your version of gcc? Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants