Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run preprocessing #66

Open
steremma opened this issue Jan 27, 2021 · 1 comment
Open

Can't run preprocessing #66

steremma opened this issue Jan 27, 2021 · 1 comment

Comments

@steremma
Copy link

After reproducing paper results, I am interested in the model's (ITM) performance on different datasets. To achieve that I have to run the preprocessing pipeline on my own images but I am getting errors on all different GPU setups I have attempted it on (K-80 | V100 | A100) because of CUDA version issues. Could you please help me troubleshoot? Perhaps switching versions of nvidia/cuda on the host would help?

You can see the stacktrace below, this particular host machine has 8 V100 GPUS, CUDA 10.2 and nvidia-driver 440.

$ bash scripts/extract_imgfeat.sh $PATH_TO_IMG $PATH_TO_NP 
extracting image features...

Status: Downloaded newer image for chenrocks/butd-caffe:nlvr2

==================
== NVIDIA Caffe ==
==================

NVIDIA Release 19.08 (build 7603994)
NVIDIA Caffe Version 0.17.3

Container image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

/src/tools/../lib/fast_rcnn/config.py:288: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yaml_cfg = edict(yaml.load(f))
I0127 11:23:48.148620     1 _caffe.cpp:64] Using devices [0]
Called with args:
Namespace(caffemodel=None, caffemodelDir='./data/faster_rcnn_models/', cfg_file='./experiments/cfgs/faster_rcnn_end2end_resnet.yml', file_id=-1, gpu_id='0', outfile=None, prefix='nlvr2', prototxt='./models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt')
Using config:
{'DATA_DIR': '/src/data',
 'DEDUP_BOXES': 0.0625,
 'EPS': 1e-14,
 'EXP_DIR': 'faster_rcnn_resnet',
 'GPU_ID': 0,
 'MATLAB': 'matlab',
 'MODELS_DIR': '/src/models/pascal_voc',
 'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]),
 'RNG_SEED': 3,
 'ROOT_DIR': '/src',
 'TEST': {'AGNOSTIC': False,
          'BBOX_REG': True,
          'HAS_ATTRIBUTES': True,
          'HAS_RELATIONS': False,
          'HAS_RPN': True,
          'MAX_SIZE': 1000,
          'NMS': 0.3,
          'PROPOSAL_METHOD': 'selective_search',
          'RPN_MIN_SIZE': 16,
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'SCALES': [600],
          'SOFT_NMS': 0,
          'SVM': False},
 'TRAIN': {'AGNOSTIC': False,
           'ASPECT_GROUPING': True,
           'BATCH_SIZE': 64,
           'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_NORMALIZE_TARGETS': True,
           'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
           'BBOX_REG': True,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'FG_FRACTION': 0.5,
           'FG_THRESH': 0.5,
           'HAS_ATTRIBUTES': True,
           'HAS_RELATIONS': False,
           'HAS_RPN': True,
           'IMS_PER_BATCH': 1,
           'MAX_SIZE': 1000,
           'MIN_RELATION_FRACTION': 0.25,
           'PROPOSAL_METHOD': 'gt',
           'RPN_BATCHSIZE': 64,
           'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_MIN_SIZE': 16,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'RPN_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'RPN_NORMALIZE_TARGETS': False,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 2000,
           'RPN_PRE_NMS_TOP_N': 12000,
           'SCALES': [600],
           'SNAPSHOT_INFIX': '',
           'SNAPSHOT_ITERS': 10000,
           'USE_FLIPPED': True,
           'USE_PREFETCH': False},
 'USE_GPU_NMS': False}
./data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel
E0127 11:23:48.156255   107 common.cpp:114] Cannot create Cublas handle. Cublas won't be available.
E0127 11:23:48.156461   107 common.cpp:121] Cannot create Curand generator. Curand won't be available.
F0127 11:23:48.156692   107 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
*** Check failure stack trace: ***
    @     0x7fe4481b00cd  google::LogMessage::Fail()
    @     0x7fe4481b1f33  google::LogMessage::SendToLog()
    @     0x7fe4481afc28  google::LogMessage::Flush()
    @     0x7fe4481b2999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fe4485ac742  caffe::Caffe::SetDevice()
    @     0x7fe4493836b1  boost::python::objects::caller_py_function_impl<>::operator()()
    @     0x7fe4478f7e55  boost::python::objects::function::call()
    @     0x7fe4478f8008  (unknown)
    @     0x7fe4478fe9a3  boost::python::handle_exception_impl()
    @     0x7fe4478f56c9  (unknown)
    @     0x55c10b7ffe30  (unknown)
    @     0x55c10b7f852a  (unknown)
    @     0x55c10b81403c  (unknown)
    @     0x55c10b7e3f1e  (unknown)
    @     0x55c10b7fd2d5  (unknown)
    @     0x55c10b7ffbe2  (unknown)
    @     0x55c10b7ffbe2  (unknown)
    @     0x55c10b7f852a  (unknown)
    @     0x55c10b813d99  (unknown)
    @     0x55c10b82c95e  (unknown)
    @     0x55c10b82c56a  (unknown)
    @     0x55c10b7e905b  (unknown)
    @     0x55c10b7ffe30  (unknown)
    @     0x55c10b7ffbe2  (unknown)
    @     0x55c10b7f852a  (unknown)
    @     0x55c10b7f7fb9  (unknown)
    @     0x55c10b828e7f  (unknown)
    @     0x55c10b823c12  (unknown)
    @     0x55c10b82309d  (unknown)
    @     0x55c10b7d1d6b  (unknown)
    @     0x7fe4f0f52b97  __libc_start_main
    @     0x55c10b7d15ea  (unknown)
done

@TriLoo
Copy link

TriLoo commented Apr 7, 2021

same error. may be you could have a look at this : issues72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants