-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tf_2_9] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. #1124
Comments
Letting the JIT compile run does work, but takes 20 minutes and runs every time. I tried explicitly not disabling the cache, but it didn't work to avoid the recompile
There were some further warnings I need to read
But the model did work to drive the car! |
@BrianHenryIE - Newer versions of tensorflow are not build with TensorRT support any longer. The idea for using TRT in the tf 2.9 branch is to train the model in standard format (which is Please note, in training you can omit the |
trying to reproduce with the following set of parameters:
getting output/error:
|
training successful (with one exception: see last lines) after setting BATCH_SIZE = 1
|
my conclusion so far: we need to rebuild tensorflow with „compute_53“ capability (should be asked during building process) as caching failed, see comments by Brian. However, this is a quick fix using tensorflow 2.12 without trt that allows driving & training on the Jetson Nano
|
When I install the tf_2_9 branch on Jetson Nano 4GB B01, capture some training data, and train it on MacOS M1, when I try to run the model on the Nano, I get the error:
Training was done with:
I'm tied to try train on device to see was it the fault of MacOS M1 but got:
(donkey) brian@brian-desktop:~/mycar$ donkey train --model=~/mycar/models/mypilot2.h5 --type=linear --tub=~/mycar/data
which was fixed by
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/mambaforge/envs/donkey/lib
but then gave the same "TensorFlow was not built..." error:
donkey train --model=~/mycar/models/mypilot2.h5 --type=linear --tub=~/mycar/data
I also tried
export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libnvinfer.so.8:/usr/lib/aarch64-linux-gnu/libgomp.so.1
from the Jetson Xavier instructions, to no effect.From what I understand, TensorFlow was built with compute capability 3.5 and 7.0, but not 5.3, which I guess is the Jetson Nano capability version. But it should be able to run at the 3.5 level. So maybe there is a flag to disable the jit compiling, which I thought I had found (https://stackoverflow.com/a/70117331/336146 and https://davy.ai/disable-cuda-ptx-to-binary-jit-compilation/) but the result was the same:
$ CUDA_CACHE_DISABLE=1 CUDA_DISABLE_PTX_JIT=1 python manage.py drive --model=models/mypiolt.h5 --type=linear
I'm trying now to let the jit compiling run.
The text was updated successfully, but these errors were encountered: