Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tf_2_9] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. #1124

Open
BrianHenryIE opened this issue Apr 22, 2023 · 5 comments

Comments

@BrianHenryIE
Copy link
Contributor

BrianHenryIE commented Apr 22, 2023

When I install the tf_2_9 branch on Jetson Nano 4GB B01, capture some training data, and train it on MacOS M1, when I try to run the model on the Nano, I get the error:

TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.

 python manage.py drive --model=models/mypiolt.h5 --type=linear
________             ______                   _________              
___  __ \_______________  /___________  __    __  ____/_____ ________
__  / / /  __ \_  __ \_  //_/  _ \_  / / /    _  /    _  __ `/_  ___/
_  /_/ // /_/ /  / / /  ,<  /  __/  /_/ /     / /___  / /_/ /_  /    
/_____/ \____//_/ /_//_/|_| \___/_\__, /      \____/  \__,_/ /_/     
                                 /____/                              

using donkey v5.0.dev1 ...
INFO:donkeycar.config:loading config file: /home/brian/mycar/config.py
INFO:donkeycar.config:loading personal config over-rides from myconfig.py
INFO:__main__:PID: 7669
WARNING:donkeycar.parts.pins:pigpio was not imported.
cfg.CAMERA_TYPE CSIC
INFO:__main__:cfg.CAMERA_TYPE CSIC
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3264 x 2464 FR = 21.000000 fps Duration = 47619048 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 3264 x 1848 FR = 28.000001 fps Duration = 35714284 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1640 x 1232 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1280 x 720 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1280 x 720 FR = 120.000005 fps Duration = 8333333 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: Running with following settings:
   Camera index = 0 
   Camera mode  = 5 
   Output Stream W = 1280 H = 720 
   seconds to Run    = 0 
   Frame Rate = 120.000005 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
[ WARN:[email protected]] global /home/brian/opencv/modules/videoio/src/cap_gstreamer.cpp (1405) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
INFO:donkeycar.parts.camera:CSICamera opened...
INFO:donkeycar.parts.camera:...warming camera
INFO:donkeycar.parts.camera:CSICamera ready.
INFO:donkeycar.vehicle:Adding part CSICamera.
INFO:donkeycar.parts.web_controller.web:Starting Donkey Server...
INFO:donkeycar.parts.web_controller.web:You can now go to brian-desktop.local:8887 to drive your car.
INFO:donkeycar.vehicle:Adding part LocalWebController.
INFO:donkeycar.vehicle:Adding part PS4JoystickController.
INFO:donkeycar.vehicle:Adding part Pipe.
INFO:donkeycar.vehicle:Adding part ExplodeDict.
INFO:donkeycar.vehicle:Adding part Lambda.
INFO:donkeycar.vehicle:Adding part Lambda.
INFO:donkeycar.vehicle:Adding part Lambda.
INFO:donkeycar.vehicle:Adding part Lambda.
INFO:donkeycar.vehicle:Adding part Lambda.
INFO:donkeycar.vehicle:Adding part ThrottleFilter.
INFO:donkeycar.vehicle:Adding part UserPilotCondition.
INFO:donkeycar.vehicle:Adding part RecordTracker.
/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
INFO:donkeycar.utils:get_model_by_type: model type is: linear
2023-04-22 16:09:30.601102: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:09:30.936621: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:09:30.937182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:09:30.937303: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1943] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
2023-04-22 16:09:30.939402: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:09:30.939821: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:09:30.940259: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:09:30.940427: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1943] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.

Training was done with:

donkey train --model=/Users/brianhenry/Sites/donkeycar-2023-04/mycar/models/mypilot.h5 --type=linear --tub=/Users/brianhenry/Sites/donkeycar-2023-04/mycar/data

I'm tied to try train on device to see was it the fault of MacOS M1 but got:

(donkey) brian@brian-desktop:~/mycar$ donkey train --model=~/mycar/models/mypilot2.h5 --type=linear --tub=~/mycar/data

________             ______                   _________              
___  __ \_______________  /___________  __    __  ____/_____ ________
__  / / /  __ \_  __ \_  //_/  _ \_  / / /    _  /    _  __ `/_  ___/
_  /_/ // /_/ /  / / /  ,<  /  __/  /_/ /     / /___  / /_/ /_  /    
/_____/ \____//_/ /_//_/|_| \___/_\__, /      \____/  \__,_/ /_/     
                                 /____/                              

using donkey v5.0.dev1 ...
INFO:donkeycar.config:loading config file: ./config.py
INFO:donkeycar.config:loading personal config over-rides from ./myconfig.py
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
Traceback (most recent call last):
  File "/home/brian/mambaforge/envs/donkey/bin/donkey", line 33, in <module>
    sys.exit(load_entry_point('donkeycar', 'console_scripts', 'donkey')())
  File "/home/brian/projects/donkeycar/donkeycar/management/base.py", line 626, in execute_from_command_line
    c.run(args[2:])
  File "/home/brian/projects/donkeycar/donkeycar/management/base.py", line 561, in run
    from donkeycar.pipeline.training import train
  File "/home/brian/projects/donkeycar/donkeycar/pipeline/training.py", line 13, in <module>
    from donkeycar.pipeline.database import PilotDatabase
  File "/home/brian/projects/donkeycar/donkeycar/pipeline/database.py", line 8, in <module>
    import pandas as pd
  File "/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/pandas/__init__.py", line 48, in <module>
    from pandas.core.api import (
  File "/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/pandas/core/api.py", line 47, in <module>
    from pandas.core.groupby import (
  File "/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.generic import (
  File "/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/pandas/core/groupby/generic.py", line 76, in <module>
    from pandas.core.frame import DataFrame
  File "/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/pandas/core/frame.py", line 172, in <module>
    from pandas.core.generic import NDFrame
  File "/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/pandas/core/generic.py", line 169, in <module>
    from pandas.core.window import (
  File "/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/pandas/core/window/__init__.py", line 1, in <module>
    from pandas.core.window.ewm import (
  File "/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/pandas/core/window/ewm.py", line 15, in <module>
    import pandas._libs.window.aggregations as window_aggregations
ImportError: /usr/lib/aarch64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/pandas/_libs/window/aggregations.cpython-39-aarch64-linux-gnu.so)

which was fixed by export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/mambaforge/envs/donkey/lib

but then gave the same "TensorFlow was not built..." error:

donkey train --model=~/mycar/models/mypilot2.h5 --type=linear --tub=~/mycar/data

________             ______                   _________              
___  __ \_______________  /___________  __    __  ____/_____ ________
__  / / /  __ \_  __ \_  //_/  _ \_  / / /    _  /    _  __ `/_  ___/
_  /_/ // /_/ /  / / /  ,<  /  __/  /_/ /     / /___  / /_/ /_  /    
/_____/ \____//_/ /_//_/|_| \___/_\__, /      \____/  \__,_/ /_/     
                                 /____/                              

using donkey v5.0.dev1 ...
INFO:donkeycar.config:loading config file: ./config.py
INFO:donkeycar.config:loading personal config over-rides from ./myconfig.py
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
INFO:donkeycar.pipeline.database:Found model database /home/brian/mycar/models/database.json
INFO:donkeycar.utils:get_model_by_type: model type is: linear
2023-04-22 16:24:50.328467: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:24:50.701236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:24:50.702429: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:24:50.702601: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1943] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
2023-04-22 16:24:50.705488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:24:50.707020: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:24:50.707491: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-22 16:24:50.707580: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1943] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.

I also tried export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libnvinfer.so.8:/usr/lib/aarch64-linux-gnu/libgomp.so.1 from the Jetson Xavier instructions, to no effect.

>>> tf.sysconfig.get_build_info()
OrderedDict([('cpu_compiler', '/usr/bin/aarch64-linux-gnu-gcc-8'), ('cuda_compute_capabilities', ['compute_35', 'compute_70']), ('cuda_version', '10.2'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)])

From what I understand, TensorFlow was built with compute capability 3.5 and 7.0, but not 5.3, which I guess is the Jetson Nano capability version. But it should be able to run at the 3.5 level. So maybe there is a flag to disable the jit compiling, which I thought I had found (https://stackoverflow.com/a/70117331/336146 and https://davy.ai/disable-cuda-ptx-to-binary-jit-compilation/) but the result was the same:
$ CUDA_CACHE_DISABLE=1 CUDA_DISABLE_PTX_JIT=1 python manage.py drive --model=models/mypiolt.h5 --type=linear

I'm trying now to let the jit compiling run.

@BrianHenryIE
Copy link
Contributor Author

BrianHenryIE commented Apr 23, 2023

Letting the JIT compile run does work, but takes 20 minutes and runs every time.

I tried explicitly not disabling the cache, but it didn't work to avoid the recompile CUDA_CACHE_DISABLE=0 CUDA_DISABLE_PTX_JIT=0

$ python manage.py drive --model=models/mypilot.h5 --type=linear

________             ______                   _________              
___  __ \_______________  /___________  __    __  ____/_____ ________
__  / / /  __ \_  __ \_  //_/  _ \_  / / /    _  /    _  __ `/_  ___/
_  /_/ // /_/ /  / / /  ,<  /  __/  /_/ /     / /___  / /_/ /_  /    
/_____/ \____//_/ /_//_/|_| \___/_\__, /      \____/  \__,_/ /_/     
                                 /____/                              

using donkey v5.0.dev1 ...
INFO:donkeycar.config:loading config file: /home/brian/mycar/config.py
INFO:donkeycar.config:loading personal config over-rides from myconfig.py
INFO:__main__:PID: 8467
WARNING:donkeycar.parts.pins:pigpio was not imported.
cfg.CAMERA_TYPE CSIC
INFO:__main__:cfg.CAMERA_TYPE CSIC
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3264 x 2464 FR = 21.000000 fps Duration = 47619048 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 3264 x 1848 FR = 28.000001 fps Duration = 35714284 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1640 x 1232 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1280 x 720 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1280 x 720 FR = 120.000005 fps Duration = 8333333 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: Running with following settings:
   Camera index = 0 
   Camera mode  = 5 
   Output Stream W = 1280 H = 720 
   seconds to Run    = 0 
   Frame Rate = 120.000005 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
[ WARN:[email protected]] global /home/brian/opencv/modules/videoio/src/cap_gstreamer.cpp (1405) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
INFO:donkeycar.parts.camera:CSICamera opened...
INFO:donkeycar.parts.camera:...warming camera
INFO:donkeycar.parts.camera:CSICamera ready.
INFO:donkeycar.vehicle:Adding part CSICamera.
INFO:donkeycar.parts.web_controller.web:Starting Donkey Server...
INFO:donkeycar.parts.web_controller.web:You can now go to brian-desktop.local:8887 to drive your car.
INFO:donkeycar.vehicle:Adding part LocalWebController.
INFO:donkeycar.vehicle:Adding part PS4JoystickController.
INFO:donkeycar.vehicle:Adding part Pipe.
INFO:donkeycar.vehicle:Adding part ExplodeDict.
INFO:donkeycar.vehicle:Adding part Lambda.
INFO:donkeycar.vehicle:Adding part Lambda.
INFO:donkeycar.vehicle:Adding part Lambda.
INFO:donkeycar.vehicle:Adding part Lambda.
INFO:donkeycar.vehicle:Adding part Lambda.
INFO:donkeycar.vehicle:Adding part ThrottleFilter.
INFO:donkeycar.vehicle:Adding part UserPilotCondition.
INFO:donkeycar.vehicle:Adding part RecordTracker.
/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/brian/mambaforge/envs/donkey/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
INFO:donkeycar.utils:get_model_by_type: model type is: linear
2023-04-23 14:31:23.595465: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-23 14:31:23.672188: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-23 14:31:23.672859: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-23 14:31:23.672995: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1943] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
2023-04-23 14:31:23.675269: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-23 14:31:23.675711: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-23 14:31:23.676174: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-23 14:31:23.676324: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1943] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
2023-04-23 14:59:56.510719: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-23 14:59:56.511140: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-23 14:59:56.511236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2023-04-23 14:59:56.511493: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-23 14:59:56.511646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 174 MB memory:  -> device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3
INFO:donkeycar.parts.keras:Created KerasLinear with interpreter: KerasInterpreter
loading model models/mypilot.h5
INFO:donkeycar.parts.keras:Loading model models/mypilot.h5
INFO:donkeycar.parts.interpreter:Loading model models/mypilot.h5
finished loading in 0.7269484996795654 sec.
INFO:donkeycar.vehicle:Adding part FileWatcher.
INFO:donkeycar.vehicle:Adding part FileWatcher.
INFO:donkeycar.vehicle:Adding part DelayedTrigger.
INFO:donkeycar.vehicle:Adding part TriggeredCallback.
Adding inference transformations
INFO:__main__:Adding inference transformations
INFO:donkeycar.parts.image_transformations:Creating ImageTransformations []
INFO:donkeycar.vehicle:Adding part ImageTransformations.
INFO:donkeycar.vehicle:Adding part KerasLinear.
INFO:donkeycar.vehicle:Adding part AiLaunch.
INFO:donkeycar.vehicle:Adding part DriveMode.
INFO:donkeycar.vehicle:Adding part ToggleRecording.
INFO:donkeycar.parts.actuator:PWM Steering created
INFO:donkeycar.parts.actuator:Init ESC
INFO:donkeycar.parts.actuator:PWM Throttle created
INFO:donkeycar.vehicle:Adding part PWMSteering.
INFO:donkeycar.vehicle:Adding part PWMThrottle.
INFO:donkeycar.parts.datastore_v2:Found datastore at /home/brian/mycar/data
INFO:donkeycar.parts.datastore_v2:Using last catalog /home/brian/mycar/data/catalog_3.catalog
INFO:donkeycar.vehicle:Adding part TubWriter.
You can now go to <your hostname.local>:8887 to drive your car.
You can now move your controller to drive your car.
Joystick Controls:
+------------------+--------------------------+
|     control      |          action          |
+------------------+--------------------------+
|      share       |       toggle_mode        |
|      circle      | show_record_count_status |
|     triangle     |   erase_last_N_records   |
|      cross       |      emergency_stop      |
|        L1        |  increase_max_throttle   |
|        R1        |  decrease_max_throttle   |
|     options      | toggle_constant_throttle |
|        R2        |     enable_ai_launch     |
| left_stick_horz  |       set_steering       |
| right_stick_vert |       set_throttle       |
+------------------+--------------------------+
INFO:donkeycar.parts.controller:Opening %s... /dev/input/js0
INFO:donkeycar.parts.controller:Device name: Wireless Controller
INFO:donkeycar.vehicle:Starting vehicle at 20 Hz

There were some further warnings I need to read

2023-04-23 15:01:13.524103: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8201

2023-04-23 15:01:17.975780: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-04-23 15:01:18.055015: W tensorflow/stream_executor/gpu/asm_compiler.cc:111] *** WARNING *** You are using ptxas 10.2.300, which is older than 11.1. ptxas before 11.1 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.

You may not need to update to CUDA 11.1; cherry-picking the ptxas binary is often sufficient.
2023-04-23 15:01:20.587361: W tensorflow/core/kernels/conv_ops_gpu.cc:336] None of the algorithms provided by cuDNN frontend heuristics worked; trying fallback algorithms.  Conv: batch: 1
in_depths: 64
out_depths: 64
in: 25
in: 25
data_format: 1
filter: 3
filter: 3
filter: 64
dilation: 1
dilation: 1
stride: 1
stride: 1
padding: 0
padding: 0
dtype: DT_FLOAT
group_count: 1
device_identifier: "NVIDIA Tegra X1 sm_5.3 with 4156514304B RAM and 1 cores"
version: 1

2023-04-23 15:01:20.784663: W tensorflow/core/kernels/conv_ops_gpu.cc:336] None of the algorithms provided by cuDNN frontend heuristics worked; trying fallback algorithms.  Conv: batch: 1
in_depths: 64
out_depths: 64
in: 23
in: 23
data_format: 1
filter: 3
filter: 3
filter: 64
dilation: 1
dilation: 1
stride: 1
stride: 1
padding: 0
padding: 0
dtype: DT_FLOAT
group_count: 1
device_identifier: "NVIDIA Tegra X1 sm_5.3 with 4156514304B RAM and 1 cores"
version: 1

But the model did work to drive the car!

@DocGarbanzo
Copy link
Contributor

@BrianHenryIE - Newer versions of tensorflow are not build with TensorRT support any longer. The idea for using TRT in the tf 2.9 branch is to train the model in standard format (which is .savedmodel btw and not .h5) and then load the model on the jetson with ... --model models/mymodel.savedmodel --type tensorrt_linear for example. It should then create the tensorrt engines at runtime and give you the fasted version of tensorrt on the target architecture. I believe that only this procedure will give you all of the performance of tensorrt on the jetson platform. If the jit compilation is too slow, I suggest you file an issue w/ NVidia support. I agree that a 20min waiting time is unacceptable, given this is a very small model in the end.

Please note, in training you can omit the --model ... as a name will be auto-generated for you (and it will also create a .savedmodel format by default).

@Heavy02011
Copy link
Contributor

Heavy02011 commented Jun 6, 2023

trying to reproduce with the following set of parameters:

DEFAULT_MODEL_TYPE = 'linear'
#DEFAULT_MODEL_TYPE = 'tensorrt_linear'
BATCH_SIZE = 8#16 # 8 #128                #how many records to use when doing one pass of gradient decent. Use a smaller number if your gpu is running out of memory.
# TRAIN_TEST_SPLIT = 0.8          #what percent of records to use for training. the remaining used for validation.
# MAX_EPOCHS = 100                #how many times to visit all records of your data
# SHOW_PLOT = True                #would you like to see a pop up display of final loss?
# VERBOSE_TRAIN = True            #would you like to see a progress bar with text during training?
# USE_EARLY_STOP = True           #would you like to stop the training if we see it's not improving fit?
# EARLY_STOP_PATIENCE = 5         #how many epochs to wait before no improvement
# MIN_DELTA = .0005               #early stop will want this much loss change before calling it improved.
# PRINT_MODEL_SUMMARY = True      #print layers and weights to stdout
# OPTIMIZER = None                #adam, sgd, rmsprop, etc.. None accepts default
# LEARNING_RATE = 0.001           #only used when OPTIMIZER specified
# LEARNING_RATE_DECAY = 0.0       #only used when OPTIMIZER specified
# SEND_BEST_MODEL_TO_PI = False   #change to true to automatically send best model during training
# CREATE_TF_LITE = True           # automatically create tflite model in training
CREATE_TENSOR_RT = True #False        # automatically create tensorrt model in training
# SAVE_MODEL_AS_H5 = False        # if old keras format should be used instead of savedmodel
CACHE_IMAGES = False             # if images are cached in training for speed up

Python 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 22:05:40) 
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.sysconfig.get_build_info()
OrderedDict([('cpu_compiler', '/usr/bin/aarch64-linux-gnu-gcc-8'), ('cuda_compute_capabilities', ['compute_35', 'compute_70']), ('cuda_version', '10.2'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)])

getting output/error:

(donkey) rainer@donkeynano11:~/data/d2$ python train.py --tub=data/tub_7_23-06-05/ --model=models/trt1
________             ______                   _________              
___  __ \_______________  /___________  __    __  ____/_____ ________
__  / / /  __ \_  __ \_  //_/  _ \_  / / /    _  /    _  __ `/_  ___/
_  /_/ // /_/ /  / / /  ,<  /  __/  /_/ /     / /___  / /_/ /_  /    
/_____/ \____//_/ /_//_/|_| \___/_\__, /      \____/  \__,_/ /_/     
                                 /____/                              

using donkey v5.0.dev1 ...
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
INFO:donkeycar.config:loading config file: /home/rainer/data/d2/config.py
INFO:donkeycar.config:loading personal config over-rides from myconfig.py
WARNING:donkeycar.pipeline.database:No model database found at /home/rainer/data/d2/models/database.json
INFO:donkeycar.utils:get_model_by_type: model type is: linear
2023-06-06 20:31:02.974359: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:31:03.292123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:31:03.292877: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:31:03.292962: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1943] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
2023-06-06 20:31:03.294917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:31:03.295515: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:31:03.295998: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:31:03.296096: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1943] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
2023-06-06 20:48:53.593966: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:48:53.595140: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:48:53.595306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2023-06-06 20:48:53.610903: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:48:53.697298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10 MB memory:  -> device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3
INFO:donkeycar.parts.keras:Created KerasLinear with interpreter: KerasInterpreter
Model: "linear"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 img_in (InputLayer)            [(None, 120, 160, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv2d_1 (Conv2D)              (None, 58, 78, 24)   1824        ['img_in[0][0]']                 
                                                                                                  
 dropout (Dropout)              (None, 58, 78, 24)   0           ['conv2d_1[0][0]']               
                                                                                                  
 conv2d_2 (Conv2D)              (None, 27, 37, 32)   19232       ['dropout[0][0]']                
                                                                                                  
 dropout_1 (Dropout)            (None, 27, 37, 32)   0           ['conv2d_2[0][0]']               
                                                                                                  
 conv2d_3 (Conv2D)              (None, 12, 17, 64)   51264       ['dropout_1[0][0]']              
                                                                                                  
 dropout_2 (Dropout)            (None, 12, 17, 64)   0           ['conv2d_3[0][0]']               
                                                                                                  
 conv2d_4 (Conv2D)              (None, 10, 15, 64)   36928       ['dropout_2[0][0]']              
                                                                                                  
 dropout_3 (Dropout)            (None, 10, 15, 64)   0           ['conv2d_4[0][0]']               
                                                                                                  
 conv2d_5 (Conv2D)              (None, 8, 13, 64)    36928       ['dropout_3[0][0]']              
                                                                                                  
 dropout_4 (Dropout)            (None, 8, 13, 64)    0           ['conv2d_5[0][0]']               
                                                                                                  
 flattened (Flatten)            (None, 6656)         0           ['dropout_4[0][0]']              
                                                                                                  
 dense_1 (Dense)                (None, 100)          665700      ['flattened[0][0]']              
                                                                                                  
 dropout_5 (Dropout)            (None, 100)          0           ['dense_1[0][0]']                
                                                                                                  
 dense_2 (Dense)                (None, 50)           5050        ['dropout_5[0][0]']              
                                                                                                  
 dropout_6 (Dropout)            (None, 50)           0           ['dense_2[0][0]']                
                                                                                                  
 n_outputs0 (Dense)             (None, 1)            51          ['dropout_6[0][0]']              
                                                                                                  
 n_outputs1 (Dense)             (None, 1)            51          ['dropout_6[0][0]']              
                                                                                                  
==================================================================================================
Total params: 817,028
Trainable params: 817,028
Non-trainable params: 0
__________________________________________________________________________________________________
INFO:donkeycar.parts.datastore_v2:Found datastore at /home/rainer/data/d2/data/tub_7_23-06-05
INFO:donkeycar.parts.datastore_v2:Using last catalog /home/rainer/data/d2/data/tub_7_23-06-05/catalog_1.catalog
INFO:donkeycar.pipeline.types:Loading tubs from paths ['data/tub_7_23-06-05/']
INFO:donkeycar.pipeline.training:Records # Training 1188
INFO:donkeycar.pipeline.training:Records # Validation 298
INFO:donkeycar.parts.tub_v2:Closing tub data/tub_7_23-06-05/
INFO:donkeycar.parts.image_transformations:Creating ImageTransformations []
INFO:donkeycar.parts.image_transformations:Creating ImageTransformations []
INFO:donkeycar.parts.image_transformations:Creating ImageTransformations []
INFO:donkeycar.parts.image_transformations:Creating ImageTransformations []
INFO:donkeycar.pipeline.training:Train with image caching: False
INFO:donkeycar.parts.keras:////////// Starting training //////////
Epoch 1/100
2023-06-06 20:49:07.989612: W tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 7.0KiB (rounded to 7424)requested by op Fill
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2023-06-06 20:49:07.989703: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] BFCAllocator dump for GPU_0_bfc
2023-06-06 20:49:07.989737: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (256): 	Total Chunks: 52, Chunks in use: 52. 13.0KiB allocated for chunks. 13.0KiB in use in bin. 3.3KiB client-requested in use in bin.
2023-06-06 20:49:07.989770: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (512): 	Total Chunks: 2, Chunks in use: 2. 1.0KiB allocated for chunks. 1.0KiB in use in bin. 800B client-requested in use in bin.
2023-06-06 20:49:07.989802: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (1024): 	Total Chunks: 2, Chunks in use: 1. 2.5KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2023-06-06 20:49:07.989828: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (2048): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.989898: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (4096): 	Total Chunks: 2, Chunks in use: 2. 14.5KiB allocated for chunks. 14.5KiB in use in bin. 14.1KiB client-requested in use in bin.
2023-06-06 20:49:07.989993: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (8192): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.990058: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (16384): 	Total Chunks: 1, Chunks in use: 1. 19.8KiB allocated for chunks. 19.8KiB in use in bin. 19.5KiB client-requested in use in bin.
2023-06-06 20:49:07.990151: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (32768): 	Total Chunks: 1, Chunks in use: 1. 32.2KiB allocated for chunks. 32.2KiB in use in bin. 19.5KiB client-requested in use in bin.
2023-06-06 20:49:07.990242: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (65536): 	Total Chunks: 2, Chunks in use: 2. 159.8KiB allocated for chunks. 159.8KiB in use in bin. 150.0KiB client-requested in use in bin.
2023-06-06 20:49:07.990287: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (131072): 	Total Chunks: 5, Chunks in use: 5. 838.0KiB allocated for chunks. 838.0KiB in use in bin. 832.0KiB client-requested in use in bin.
2023-06-06 20:49:07.990326: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (262144): 	Total Chunks: 1, Chunks in use: 1. 256.0KiB allocated for chunks. 256.0KiB in use in bin. 144.0KiB client-requested in use in bin.
2023-06-06 20:49:07.990364: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.990403: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.990437: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (2097152): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.990519: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (4194304): 	Total Chunks: 2, Chunks in use: 2. 9.09MiB allocated for chunks. 9.09MiB in use in bin. 5.08MiB client-requested in use in bin.
2023-06-06 20:49:07.990589: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.990657: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (16777216): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.990723: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (33554432): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.990789: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (67108864): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.990855: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (134217728): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.990936: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] Bin (268435456): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-06-06 20:49:07.991005: I tensorflow/core/common_runtime/bfc_allocator.cc:1050] Bin for 7.2KiB was 4.0KiB, Chunk State: 
2023-06-06 20:49:07.991055: I tensorflow/core/common_runtime/bfc_allocator.cc:1063] Next region of size 10899456
2023-06-06 20:49:07.991112: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70000 of size 1280 next 1
2023-06-06 20:49:07.991164: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70500 of size 256 next 2
2023-06-06 20:49:07.991213: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70600 of size 256 next 3
2023-06-06 20:49:07.991261: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70700 of size 256 next 4
2023-06-06 20:49:07.991310: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70800 of size 256 next 5
2023-06-06 20:49:07.991366: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70900 of size 256 next 8
2023-06-06 20:49:07.991417: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70a00 of size 256 next 9
2023-06-06 20:49:07.991468: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70b00 of size 256 next 10
2023-06-06 20:49:07.991517: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70c00 of size 256 next 13
2023-06-06 20:49:07.991566: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70d00 of size 256 next 14
2023-06-06 20:49:07.991614: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70e00 of size 256 next 15
2023-06-06 20:49:07.991663: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a70f00 of size 256 next 18
2023-06-06 20:49:07.991714: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71000 of size 256 next 19
2023-06-06 20:49:07.991762: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71100 of size 256 next 20
2023-06-06 20:49:07.991810: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71200 of size 256 next 23
2023-06-06 20:49:07.991858: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71300 of size 256 next 24
2023-06-06 20:49:07.991907: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71400 of size 256 next 25
2023-06-06 20:49:07.991957: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71500 of size 512 next 26
2023-06-06 20:49:07.992005: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71700 of size 256 next 27
2023-06-06 20:49:07.992054: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71800 of size 256 next 29
2023-06-06 20:49:07.992101: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71900 of size 256 next 30
2023-06-06 20:49:07.992148: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71a00 of size 256 next 33
2023-06-06 20:49:07.992195: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71b00 of size 256 next 34
2023-06-06 20:49:07.992242: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71c00 of size 256 next 35
2023-06-06 20:49:07.992289: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71d00 of size 256 next 36
2023-06-06 20:49:07.992336: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71e00 of size 256 next 37
2023-06-06 20:49:07.992383: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a71f00 of size 256 next 38
2023-06-06 20:49:07.992430: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72000 of size 256 next 39
2023-06-06 20:49:07.992477: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72100 of size 256 next 40
2023-06-06 20:49:07.992525: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72200 of size 256 next 41
2023-06-06 20:49:07.992574: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72300 of size 256 next 42
2023-06-06 20:49:07.992621: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72400 of size 256 next 43
2023-06-06 20:49:07.992668: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72500 of size 256 next 44
2023-06-06 20:49:07.992759: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72600 of size 256 next 45
2023-06-06 20:49:07.992812: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72700 of size 256 next 46
2023-06-06 20:49:07.992861: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72800 of size 256 next 47
2023-06-06 20:49:07.992909: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72900 of size 256 next 48
2023-06-06 20:49:07.992957: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72a00 of size 256 next 49
2023-06-06 20:49:07.993004: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72b00 of size 256 next 50
2023-06-06 20:49:07.993052: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72c00 of size 256 next 51
2023-06-06 20:49:07.993099: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72d00 of size 256 next 52
2023-06-06 20:49:07.993147: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72e00 of size 256 next 53
2023-06-06 20:49:07.993196: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a72f00 of size 256 next 54
2023-06-06 20:49:07.993246: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73000 of size 256 next 55
2023-06-06 20:49:07.993299: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73100 of size 256 next 57
2023-06-06 20:49:07.993352: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73200 of size 256 next 58
2023-06-06 20:49:07.993402: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73300 of size 256 next 60
2023-06-06 20:49:07.993451: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73400 of size 256 next 61
2023-06-06 20:49:07.993501: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73500 of size 256 next 63
2023-06-06 20:49:07.993552: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73600 of size 512 next 64
2023-06-06 20:49:07.993602: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73800 of size 256 next 65
2023-06-06 20:49:07.993652: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73900 of size 256 next 66
2023-06-06 20:49:07.993704: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73a00 of size 256 next 67
2023-06-06 20:49:07.993753: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73b00 of size 256 next 68
2023-06-06 20:49:07.993801: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a73c00 of size 256 next 69
2023-06-06 20:49:07.993848: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] Free  at f00a73d00 of size 1280 next 6
2023-06-06 20:49:07.993898: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a74200 of size 7424 next 7
2023-06-06 20:49:07.993950: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a75f00 of size 153600 next 12
2023-06-06 20:49:07.994000: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00a9b700 of size 76800 next 11
2023-06-06 20:49:07.994049: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00aae300 of size 7424 next 56
2023-06-06 20:49:07.994098: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00ab0000 of size 33024 next 32
2023-06-06 20:49:07.994148: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00ab8100 of size 20224 next 31
2023-06-06 20:49:07.994196: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00abd000 of size 86784 next 21
2023-06-06 20:49:07.994245: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00ad2300 of size 262144 next 17
2023-06-06 20:49:07.994293: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00b12300 of size 204800 next 16
2023-06-06 20:49:07.994343: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00b44300 of size 147456 next 22
2023-06-06 20:49:07.994393: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00b68300 of size 204800 next 59
2023-06-06 20:49:07.994441: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00b9a300 of size 147456 next 62
2023-06-06 20:49:07.994491: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f00bbe300 of size 4972544 next 28
2023-06-06 20:49:07.994541: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at f0107c300 of size 4558080 next 18446744073709551615
2023-06-06 20:49:07.994588: I tensorflow/core/common_runtime/bfc_allocator.cc:1088]      Summary of in-use Chunks by size: 
2023-06-06 20:49:07.994645: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 52 Chunks of size 256 totalling 13.0KiB
2023-06-06 20:49:07.994698: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 512 totalling 1.0KiB
2023-06-06 20:49:07.994749: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 1280 totalling 1.2KiB
2023-06-06 20:49:07.994801: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 7424 totalling 14.5KiB
2023-06-06 20:49:07.994853: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 20224 totalling 19.8KiB
2023-06-06 20:49:07.994907: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 33024 totalling 32.2KiB
2023-06-06 20:49:07.994960: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 76800 totalling 75.0KiB
2023-06-06 20:49:07.995012: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 86784 totalling 84.8KiB
2023-06-06 20:49:07.995075: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 147456 totalling 288.0KiB
2023-06-06 20:49:07.995132: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 153600 totalling 150.0KiB
2023-06-06 20:49:07.995182: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 204800 totalling 400.0KiB
2023-06-06 20:49:07.995222: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 262144 totalling 256.0KiB
2023-06-06 20:49:07.995259: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 4558080 totalling 4.35MiB
2023-06-06 20:49:07.995293: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 4972544 totalling 4.74MiB
2023-06-06 20:49:07.995327: I tensorflow/core/common_runtime/bfc_allocator.cc:1095] Sum Total of in-use chunks: 10.39MiB
2023-06-06 20:49:07.995360: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] total_region_allocated_bytes_: 10899456 memory_limit_: 10899456 available bytes: 0 curr_region_allocation_bytes_: 21798912
2023-06-06 20:49:07.995402: I tensorflow/core/common_runtime/bfc_allocator.cc:1103] Stats: 
Limit:                        10899456
InUse:                        10898176
MaxInUse:                     10898176
NumAllocs:                          96
MaxAllocSize:                  4972544
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2023-06-06 20:49:07.995526: W tensorflow/core/common_runtime/bfc_allocator.cc:491] *************************************xxxxxxxxxxxxxxxxxxxxx*************************xxxxxxxxxxxxxxxxx
2023-06-06 20:49:07.995717: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at constant_op.cc:175 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[5,5,3,24] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/home/rainer/data/d2/train.py", line 31, in <module>
    main()
  File "/home/rainer/data/d2/train.py", line 27, in main
    train(cfg, tubs, model, model_type, comment)
  File "/home/rainer/projects/donkeycar/donkeycar/pipeline/training.py", line 157, in train
    history = kl.train(model_path=model_path,
  File "/home/rainer/projects/donkeycar/donkeycar/parts/keras.py", line 168, in train
    history: tf.keras.callbacks.History = model.fit(
  File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1127, in autograph_handler
    raise e.ag_error_metadata.to_exception(e)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: in user code:

    File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/engine/training.py", line 1051, in train_function  *
        return step_function(self, iterator)
    File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/engine/training.py", line 1040, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/engine/training.py", line 1030, in run_step  **
        outputs = model.train_step(data)
    File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/engine/training.py", line 893, in train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 539, in minimize
        return self.apply_gradients(grads_and_vars, name=name)
    File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 646, in apply_gradients
        self._create_all_weights(var_list)
    File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 860, in _create_all_weights
        self._create_slots(var_list)
    File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/adam.py", line 124, in _create_slots
        self.add_slot(var, 'v')
    File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 946, in add_slot
        weight = tf.Variable(
    File "/home/rainer/mambaforge/envs/donkey/lib/python3.9/site-packages/keras/initializers/initializers_v2.py", line 152, in __call__
        return tf.zeros(shape, dtype)

    ResourceExhaustedError: OOM when allocating tensor with shape[5,5,3,24] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Fill]

@Heavy02011
Copy link
Contributor

Heavy02011 commented Jun 6, 2023

training successful (with one exception: see last lines) after setting BATCH_SIZE = 1
BUT waiting time ca. 20min for compiling

(donkey) rainer@donkeynano11:~/data/d2$ python train.py --tub=data/tub_7_23-06-05/ --model=models/trt1
________             ______                   _________              
___  __ \_______________  /___________  __    __  ____/_____ ________
__  / / /  __ \_  __ \_  //_/  _ \_  / / /    _  /    _  __ `/_  ___/
_  /_/ // /_/ /  / / /  ,<  /  __/  /_/ /     / /___  / /_/ /_  /    
/_____/ \____//_/ /_//_/|_| \___/_\__, /      \____/  \__,_/ /_/     
                                 /____/                              

using donkey v5.0.dev1 ...
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
INFO:donkeycar.config:loading config file: /home/rainer/data/d2/config.py
INFO:donkeycar.config:loading personal config over-rides from myconfig.py
WARNING:donkeycar.pipeline.database:No model database found at /home/rainer/data/d2/models/database.json
INFO:donkeycar.utils:get_model_by_type: model type is: linear
2023-06-06 20:54:16.979935: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:54:17.307749: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:54:17.308414: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:54:17.308521: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1943] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
2023-06-06 20:54:17.310622: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:54:17.311170: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:54:17.311605: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 20:54:17.311684: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1943] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.3. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
2023-06-06 21:11:25.993902: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 21:11:25.994608: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 21:11:25.994845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2023-06-06 21:11:25.995356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-06 21:11:25.995610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 177 MB memory:  -> device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3
INFO:donkeycar.parts.keras:Created KerasLinear with interpreter: KerasInterpreter
Model: "linear"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 img_in (InputLayer)            [(None, 120, 160, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv2d_1 (Conv2D)              (None, 58, 78, 24)   1824        ['img_in[0][0]']                 
                                                                                                  
 dropout (Dropout)              (None, 58, 78, 24)   0           ['conv2d_1[0][0]']               
                                                                                                  
 conv2d_2 (Conv2D)              (None, 27, 37, 32)   19232       ['dropout[0][0]']                
                                                                                                  
 dropout_1 (Dropout)            (None, 27, 37, 32)   0           ['conv2d_2[0][0]']               
                                                                                                  
 conv2d_3 (Conv2D)              (None, 12, 17, 64)   51264       ['dropout_1[0][0]']              
                                                                                                  
 dropout_2 (Dropout)            (None, 12, 17, 64)   0           ['conv2d_3[0][0]']               
                                                                                                  
 conv2d_4 (Conv2D)              (None, 10, 15, 64)   36928       ['dropout_2[0][0]']              
                                                                                                  
 dropout_3 (Dropout)            (None, 10, 15, 64)   0           ['conv2d_4[0][0]']               
                                                                                                  
 conv2d_5 (Conv2D)              (None, 8, 13, 64)    36928       ['dropout_3[0][0]']              
                                                                                                  
 dropout_4 (Dropout)            (None, 8, 13, 64)    0           ['conv2d_5[0][0]']               
                                                                                                  
 flattened (Flatten)            (None, 6656)         0           ['dropout_4[0][0]']              
                                                                                                  
 dense_1 (Dense)                (None, 100)          665700      ['flattened[0][0]']              
                                                                                                  
 dropout_5 (Dropout)            (None, 100)          0           ['dense_1[0][0]']                
                                                                                                  
 dense_2 (Dense)                (None, 50)           5050        ['dropout_5[0][0]']              
                                                                                                  
 dropout_6 (Dropout)            (None, 50)           0           ['dense_2[0][0]']                
                                                                                                  
 n_outputs0 (Dense)             (None, 1)            51          ['dropout_6[0][0]']              
                                                                                                  
 n_outputs1 (Dense)             (None, 1)            51          ['dropout_6[0][0]']              
                                                                                                  
==================================================================================================
Total params: 817,028
Trainable params: 817,028
Non-trainable params: 0
__________________________________________________________________________________________________
INFO:donkeycar.parts.datastore_v2:Found datastore at /home/rainer/data/d2/data/tub_7_23-06-05
INFO:donkeycar.parts.datastore_v2:Using last catalog /home/rainer/data/d2/data/tub_7_23-06-05/catalog_1.catalog
INFO:donkeycar.pipeline.types:Loading tubs from paths ['data/tub_7_23-06-05/']
INFO:donkeycar.pipeline.training:Records # Training 1188
INFO:donkeycar.pipeline.training:Records # Validation 298
INFO:donkeycar.parts.tub_v2:Closing tub data/tub_7_23-06-05/
INFO:donkeycar.parts.image_transformations:Creating ImageTransformations []
INFO:donkeycar.parts.image_transformations:Creating ImageTransformations []
INFO:donkeycar.parts.image_transformations:Creating ImageTransformations []
INFO:donkeycar.parts.image_transformations:Creating ImageTransformations []
INFO:donkeycar.pipeline.training:Train with image caching: False
INFO:donkeycar.parts.keras:////////// Starting training //////////
Epoch 1/100
2023-06-06 21:11:38.987688: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8201
2023-06-06 21:11:54.167305: W tensorflow/stream_executor/gpu/asm_compiler.cc:111] *** WARNING *** You are using ptxas 10.2.300, which is older than 11.1. ptxas before 11.1 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.

You may not need to update to CUDA 11.1; cherry-picking the ptxas binary is often sufficient.
1187/1188 [============================>.] - ETA: 0s - loss: 0.0653 - n_outputs0_loss: 0.0019 - n_outputs1_loss: 0.0635          
Epoch 1: val_loss improved from inf to 0.04609, saving model to /home/rainer/data/d2/models/trt1
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 5). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /home/rainer/data/d2/models/trt1/assets
1188/1188 [==============================] - 159s 61ms/step - loss: 0.0653 - n_outputs0_loss: 0.0019 - n_outputs1_loss: 0.0634 - val_loss: 0.0461 - val_n_outputs0_loss: 2.1981e-05 - val_n_outputs1_loss: 0.0461
Epoch 2/100
1188/1188 [==============================] - ETA: 0s - loss: 0.0433 - n_outputs0_loss: 3.0325e-04 - n_outputs1_loss: 0.0430   
Epoch 2: val_loss improved from 0.04609 to 0.03934, saving model to /home/rainer/data/d2/models/trt1
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 5). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /home/rainer/data/d2/models/trt1/assets
1188/1188 [==============================] - 61s 51ms/step - loss: 0.0433 - n_outputs0_loss: 3.0325e-04 - n_outputs1_loss: 0.0430 - val_loss: 0.0393 - val_n_outputs0_loss: 2.1081e-05 - val_n_outputs1_loss: 0.0393
Epoch 3/100
1187/1188 [============================>.] - ETA: 0s - loss: 0.0394 - n_outputs0_loss: 1.2233e-04 - n_outputs1_loss: 0.0393  
Epoch 3: val_loss improved from 0.03934 to 0.03111, saving model to /home/rainer/data/d2/models/trt1
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 5). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /home/rainer/data/d2/models/trt1/assets
1188/1188 [==============================] - 89s 75ms/step - loss: 0.0394 - n_outputs0_loss: 1.2223e-04 - n_outputs1_loss: 0.0393 - val_loss: 0.0311 - val_n_outputs0_loss: 3.2221e-06 - val_n_outputs1_loss: 0.0311
Epoch 4/100
1187/1188 [============================>.] - ETA: 0s - loss: 0.0410 - n_outputs0_loss: 5.5577e-05 - n_outputs1_loss: 0.0409           
Epoch 4: val_loss improved from 0.03111 to 0.03081, saving model to /home/rainer/data/d2/models/trt1
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 5). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /home/rainer/data/d2/models/trt1/assets
1188/1188 [==============================] - 59s 49ms/step - loss: 0.0409 - n_outputs0_loss: 5.5533e-05 - n_outputs1_loss: 0.0409 - val_loss: 0.0308 - val_n_outputs0_loss: 2.5976e-07 - val_n_outputs1_loss: 0.0308
Epoch 5/100
1187/1188 [============================>.] - ETA: 0s - loss: 0.0335 - n_outputs0_loss: 1.4143e-05 - n_outputs1_loss: 0.0335  
Epoch 5: val_loss improved from 0.03081 to 0.03010, saving model to /home/rainer/data/d2/models/trt1
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 5). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /home/rainer/data/d2/models/trt1/assets
1188/1188 [==============================] - 54s 46ms/step - loss: 0.0334 - n_outputs0_loss: 1.4131e-05 - n_outputs1_loss: 0.0334 - val_loss: 0.0301 - val_n_outputs0_loss: 6.3786e-07 - val_n_outputs1_loss: 0.0301
Epoch 6/100
1188/1188 [==============================] - ETA: 0s - loss: 0.0335 - n_outputs0_loss: 5.8276e-06 - n_outputs1_loss: 0.0335  
Epoch 6: val_loss did not improve from 0.03010
1188/1188 [==============================] - 48s 41ms/step - loss: 0.0335 - n_outputs0_loss: 5.8276e-06 - n_outputs1_loss: 0.0335 - val_loss: 0.0304 - val_n_outputs0_loss: 7.5140e-06 - val_n_outputs1_loss: 0.0304
Epoch 7/100
1187/1188 [============================>.] - ETA: 0s - loss: 0.0330 - n_outputs0_loss: 4.6304e-06 - n_outputs1_loss: 0.0330  
Epoch 7: val_loss did not improve from 0.03010
1188/1188 [==============================] - 47s 40ms/step - loss: 0.0329 - n_outputs0_loss: 4.6280e-06 - n_outputs1_loss: 0.0329 - val_loss: 0.0302 - val_n_outputs0_loss: 3.8068e-07 - val_n_outputs1_loss: 0.0302
Epoch 8/100
1188/1188 [==============================] - ETA: 0s - loss: 0.0328 - n_outputs0_loss: 6.6695e-06 - n_outputs1_loss: 0.0328  
Epoch 8: val_loss did not improve from 0.03010
1188/1188 [==============================] - 47s 40ms/step - loss: 0.0328 - n_outputs0_loss: 6.6695e-06 - n_outputs1_loss: 0.0328 - val_loss: 0.0304 - val_n_outputs0_loss: 9.3644e-08 - val_n_outputs1_loss: 0.0304
Epoch 9/100
1187/1188 [============================>.] - ETA: 0s - loss: 0.0318 - n_outputs0_loss: 3.5991e-06 - n_outputs1_loss: 0.0318  
Epoch 9: val_loss improved from 0.03010 to 0.02996, saving model to /home/rainer/data/d2/models/trt1
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 5). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /home/rainer/data/d2/models/trt1/assets
1188/1188 [==============================] - 53s 45ms/step - loss: 0.0318 - n_outputs0_loss: 3.5966e-06 - n_outputs1_loss: 0.0318 - val_loss: 0.0300 - val_n_outputs0_loss: 1.1442e-07 - val_n_outputs1_loss: 0.0300
Epoch 10/100
1187/1188 [============================>.] - ETA: 0s - loss: 0.0314 - n_outputs0_loss: 7.2566e-06 - n_outputs1_loss: 0.0314  
Epoch 10: val_loss improved from 0.02996 to 0.02993, saving model to /home/rainer/data/d2/models/trt1
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 5). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /home/rainer/data/d2/models/trt1/assets
1188/1188 [==============================] - 54s 46ms/step - loss: 0.0314 - n_outputs0_loss: 7.2507e-06 - n_outputs1_loss: 0.0314 - val_loss: 0.0299 - val_n_outputs0_loss: 4.3600e-07 - val_n_outputs1_loss: 0.0299
INFO:donkeycar.parts.keras:////////// Finished training in: 0:11:13.267213 //////////
INFO:donkeycar.parts.interpreter:Convert model /home/rainer/data/d2/models/trt1 to TFLite /home/rainer/data/d2/models/trt1.tflite
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 5). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /tmp/tmpdz9384s9/assets
2023-06-06 21:23:04.551175: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2023-06-06 21:23:04.551257: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2023-06-06 21:23:04.581315: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/tmpdz9384s9
2023-06-06 21:23:04.615103: I tensorflow/cc/saved_model/reader.cc:81] Reading meta graph with tags { serve }
2023-06-06 21:23:04.628192: I tensorflow/cc/saved_model/reader.cc:122] Reading SavedModel debug info (if present) from: /tmp/tmpdz9384s9
2023-06-06 21:23:04.695728: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2023-06-06 21:23:04.710452: I tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle.
2023-06-06 21:23:05.275281: I tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: /tmp/tmpdz9384s9
2023-06-06 21:23:05.419491: I tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 838208 microseconds.
2023-06-06 21:23:05.818251: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:263] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
INFO:donkeycar.parts.interpreter:TFLite conversion done.
INFO:donkeycar.parts.interpreter:Converting SavedModel /home/rainer/data/d2/models/trt1.savedmodel to TensorRT /home/rainer/data/d2/models/trt1.trt
INFO:tensorflow:Linked TensorRT version: (8, 2, 1)
INFO:tensorflow:Loaded TensorRT version: (8, 2, 1)
ERROR:donkeycar.parts.interpreter:TensorRT conversion failed because: SavedModel file does not exist at: /home/rainer/data/d2/models/trt1.savedmodel/{saved_model.pbtxt|saved_model.pb}
INFO:donkeycar.pipeline.database:Writing database file: /home/rainer/data/d2/models/database.json

@Heavy02011
Copy link
Contributor

Heavy02011 commented Jun 8, 2023

my conclusion so far: we need to rebuild tensorflow with „compute_53“ capability (should be asked during building process) as caching failed, see comments by Brian.

However, this is a quick fix using tensorflow 2.12 without trt that allows driving & training on the Jetson Nano

conda clone --name donkey212 --clone donkey
conda activate donkey212
pip uninstall tensorflow
pip install tensorflow
>>> import tensorflow as tf
>>> tf.sysconfig.get_build_info()
OrderedDict([('is_cuda_build', False), ('is_rocm_build', False), ('is_tensorrt_build', False)])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants