Can't run preprocessing #66

steremma · 2021-01-27T11:31:46Z

After reproducing paper results, I am interested in the model's (ITM) performance on different datasets. To achieve that I have to run the preprocessing pipeline on my own images but I am getting errors on all different GPU setups I have attempted it on (K-80 | V100 | A100) because of CUDA version issues. Could you please help me troubleshoot? Perhaps switching versions of nvidia/cuda on the host would help?

You can see the stacktrace below, this particular host machine has 8 V100 GPUS, CUDA 10.2 and nvidia-driver 440.

$ bash scripts/extract_imgfeat.sh $PATH_TO_IMG $PATH_TO_NP 
extracting image features...

Status: Downloaded newer image for chenrocks/butd-caffe:nlvr2

==================
== NVIDIA Caffe ==
==================

NVIDIA Release 19.08 (build 7603994)
NVIDIA Caffe Version 0.17.3

Container image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

/src/tools/../lib/fast_rcnn/config.py:288: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yaml_cfg = edict(yaml.load(f))
I0127 11:23:48.148620     1 _caffe.cpp:64] Using devices [0]
Called with args:
Namespace(caffemodel=None, caffemodelDir='./data/faster_rcnn_models/', cfg_file='./experiments/cfgs/faster_rcnn_end2end_resnet.yml', file_id=-1, gpu_id='0', outfile=None, prefix='nlvr2', prototxt='./models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt')
Using config:
{'DATA_DIR': '/src/data',
 'DEDUP_BOXES': 0.0625,
 'EPS': 1e-14,
 'EXP_DIR': 'faster_rcnn_resnet',
 'GPU_ID': 0,
 'MATLAB': 'matlab',
 'MODELS_DIR': '/src/models/pascal_voc',
 'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]),
 'RNG_SEED': 3,
 'ROOT_DIR': '/src',
 'TEST': {'AGNOSTIC': False,
          'BBOX_REG': True,
          'HAS_ATTRIBUTES': True,
          'HAS_RELATIONS': False,
          'HAS_RPN': True,
          'MAX_SIZE': 1000,
          'NMS': 0.3,
          'PROPOSAL_METHOD': 'selective_search',
          'RPN_MIN_SIZE': 16,
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'SCALES': [600],
          'SOFT_NMS': 0,
          'SVM': False},
 'TRAIN': {'AGNOSTIC': False,
           'ASPECT_GROUPING': True,
           'BATCH_SIZE': 64,
           'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_NORMALIZE_TARGETS': True,
           'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
           'BBOX_REG': True,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'FG_FRACTION': 0.5,
           'FG_THRESH': 0.5,
           'HAS_ATTRIBUTES': True,
           'HAS_RELATIONS': False,
           'HAS_RPN': True,
           'IMS_PER_BATCH': 1,
           'MAX_SIZE': 1000,
           'MIN_RELATION_FRACTION': 0.25,
           'PROPOSAL_METHOD': 'gt',
           'RPN_BATCHSIZE': 64,
           'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_MIN_SIZE': 16,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'RPN_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'RPN_NORMALIZE_TARGETS': False,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 2000,
           'RPN_PRE_NMS_TOP_N': 12000,
           'SCALES': [600],
           'SNAPSHOT_INFIX': '',
           'SNAPSHOT_ITERS': 10000,
           'USE_FLIPPED': True,
           'USE_PREFETCH': False},
 'USE_GPU_NMS': False}
./data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel
E0127 11:23:48.156255   107 common.cpp:114] Cannot create Cublas handle. Cublas won't be available.
E0127 11:23:48.156461   107 common.cpp:121] Cannot create Curand generator. Curand won't be available.
F0127 11:23:48.156692   107 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
*** Check failure stack trace: ***
    @     0x7fe4481b00cd  google::LogMessage::Fail()
    @     0x7fe4481b1f33  google::LogMessage::SendToLog()
    @     0x7fe4481afc28  google::LogMessage::Flush()
    @     0x7fe4481b2999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fe4485ac742  caffe::Caffe::SetDevice()
    @     0x7fe4493836b1  boost::python::objects::caller_py_function_impl<>::operator()()
    @     0x7fe4478f7e55  boost::python::objects::function::call()
    @     0x7fe4478f8008  (unknown)
    @     0x7fe4478fe9a3  boost::python::handle_exception_impl()
    @     0x7fe4478f56c9  (unknown)
    @     0x55c10b7ffe30  (unknown)
    @     0x55c10b7f852a  (unknown)
    @     0x55c10b81403c  (unknown)
    @     0x55c10b7e3f1e  (unknown)
    @     0x55c10b7fd2d5  (unknown)
    @     0x55c10b7ffbe2  (unknown)
    @     0x55c10b7ffbe2  (unknown)
    @     0x55c10b7f852a  (unknown)
    @     0x55c10b813d99  (unknown)
    @     0x55c10b82c95e  (unknown)
    @     0x55c10b82c56a  (unknown)
    @     0x55c10b7e905b  (unknown)
    @     0x55c10b7ffe30  (unknown)
    @     0x55c10b7ffbe2  (unknown)
    @     0x55c10b7f852a  (unknown)
    @     0x55c10b7f7fb9  (unknown)
    @     0x55c10b828e7f  (unknown)
    @     0x55c10b823c12  (unknown)
    @     0x55c10b82309d  (unknown)
    @     0x55c10b7d1d6b  (unknown)
    @     0x7fe4f0f52b97  __libc_start_main
    @     0x55c10b7d15ea  (unknown)
done

The text was updated successfully, but these errors were encountered:

TriLoo · 2021-04-07T10:45:45Z

same error. may be you could have a look at this : issues72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't run preprocessing #66

Can't run preprocessing #66

steremma commented Jan 27, 2021

TriLoo commented Apr 7, 2021

Can't run preprocessing #66

Can't run preprocessing #66

Comments

steremma commented Jan 27, 2021

TriLoo commented Apr 7, 2021