Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HW Accel Support]: Fresh pull of v13 in Unraid - CUDA initialization failure #9575

Closed
usafle opened this issue Feb 1, 2024 · 17 comments
Closed

Comments

@usafle
Copy link

usafle commented Feb 1, 2024

Describe the problem you are having

I had removed the original Frigate container and template and pulled down a "fresh" copy for v13 and installed the NVIDIA Branch when it asked me in Community Apps which branch I wanted to install. So I did not upgrade from 12-13, I started with a new pull down of the container. I have a CUDA capable GPU installed and visibile.

Version

v13

Frigate config file

mqtt:
  enabled: true
  host: 192.168.1.102
  user: frigate
  password: PASSWORD
# detectors:
#  cpu1:
#    type: cpu
#    num_threads: 2

# birdseye:
#   enabled: True
#   restream: false
#   mode: continuous
#   width: 1280
#   height: 720
#   quality: 8

go2rtc:
  streams:
    Rear_Deck:
      - rtsp://admin:[email protected]:554/h264Preview_01_main
    Rear_Deck_sub:
      - rtsp://admin:[email protected]:554/h264Preview_01_sub
    Garage_Camera:
      - rtsp://admin:[email protected]:554/cam/realmonitor?channel=1&subtype=0
    Garage_Camera_sub:
     - rtsp://admin:[email protected]:554/cam/realmonitor?channel=1&subtype=1

ffmpeg:
  hwaccel_args: preset-nvidia-h265

rtmp:
  enabled: False 

cameras:
############## REAR DECK ##################
  Rear_Deck:
    ffmpeg:
      inputs:
        - path: rtsp://127.0.0.1:8554/Rear_Deck_sub
          input_args: preset-rtsp-restream
          roles:
            - detect
        - path: rtsp://127.0.0.1:8554/Rear_Deck
          input_args: preset-rtsp-restream
          roles:
            - record
      output_args:
        record: -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c:v copy -c:a aac
    objects:
      track:
        - person
        - dog
        - bird
        - cat
    detect:
      width: 1280
      height: 720
      fps: 4
    record:
      enabled: True
      events:
        retain:
          default: 2
    snapshots:
      enabled: True

  Garage_Camera:
    ffmpeg:
      inputs:
        - path: rtsp://127.0.0.1:8554/Garage_Camera_sub
          input_args: preset-rtsp-restream
          roles:
            - detect
        - path: rtsp://127.0.0.1:8554/Garage_Camera
          input_args: preset-rtsp-restream
          roles:
            - record
      output_args:
        record: -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c:v copy -c:a aac
        # record: preset-record-generic-audio-aac
    objects:
      track:
        - person
        - dog
        - cat
        - car
        - package
    detect:
      width: 1280
      height: 720
      fps: 4
    record:
      enabled: True
      events:
        retain:
          default: 2
    snapshots:
      enabled: True

docker-compose file or Docker CLI command

Installed Via Community Apps

Relevant log output

s6-rc: info: service s6rc-fdholder: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service s6rc-fdholder successfully started
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service trt-model-prepare: starting
s6-rc: info: service log-prepare: starting
s6-rc: info: service log-prepare successfully started
s6-rc: info: service nginx-log: starting
s6-rc: info: service go2rtc-log: starting
s6-rc: info: service frigate-log: starting
s6-rc: info: service nginx-log successfully started
s6-rc: info: service go2rtc-log successfully started
s6-rc: info: service go2rtc: starting
s6-rc: info: service frigate-log successfully started
s6-rc: info: service go2rtc successfully started
s6-rc: info: service go2rtc-healthcheck: starting
s6-rc: info: service go2rtc-healthcheck successfully started
Generating the following TRT Models: yolov4-416,yolov4-tiny-416
Downloading yolo weights
2024-02-01 10:30:12.079536551  [INFO] Preparing new go2rtc config...
2024-02-01 10:30:13.159361526  [INFO] Starting go2rtc...
2024-02-01 10:30:13.279687971  10:30:13.279 INF go2rtc version 1.8.4 linux/amd64
2024-02-01 10:30:13.280390371  10:30:13.280 INF [api] listen addr=:1984
2024-02-01 10:30:13.280428249  10:30:13.280 INF [rtsp] listen addr=:8554
2024-02-01 10:30:13.280808608  10:30:13.280 INF [webrtc] listen addr=:8555

Creating yolov4-tiny-416.cfg and yolov4-tiny-416.weights
Creating yolov4-416.cfg and yolov4-416.weights

Done.
2024-02-01 10:30:21.744747617  [INFO] Starting go2rtc healthcheck service...

Generating yolov4-416.trt. This may take a few minutes.

Traceback (most recent call last):
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 214, in <module>
    main()
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 202, in main
    engine = build_engine(
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 112, in build_engine
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(*EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
TypeError: pybind11::init(): factory function returned nullptr
[02/01/2024-10:30:38] [TRT] [W] Unable to determine GPU memory usage
[02/01/2024-10:30:38] [TRT] [W] Unable to determine GPU memory usage
[02/01/2024-10:30:38] [TRT] [W] CUDA initialization failure with error: 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Loading the ONNX file...

Generating yolov4-tiny-416.trt. This may take a few minutes.

Traceback (most recent call last):
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 214, in <module>
    main()
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 202, in main
    engine = build_engine(
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 112, in build_engine
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(*EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
TypeError: pybind11::init(): factory function returned nullptr
[02/01/2024-10:30:41] [TRT] [W] Unable to determine GPU memory usage
[02/01/2024-10:30:41] [TRT] [W] Unable to determine GPU memory usage
[02/01/2024-10:30:41] [TRT] [W] CUDA initialization failure with error: 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Loading the ONNX file...
Available tensorrt models:
ls: cannot access '*.trt': No such file or directory
s6-rc: warning: unable to start service trt-model-prepare: command exited 2

FFprobe output from your camera

Can't access Frigate due to above error

Operating system

UNRAID

Install method

Docker Compose

Network connection

Wired

Camera make and model

Reolink + Amcrest

Any other information that may be helpful

No response

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Feb 1, 2024

what version of the nvidia driver is installed and what is your docker cli command

@usafle
Copy link
Author

usafle commented Feb 1, 2024

Nvidia Driver Version: 545.29.06 / NVIDIA GeForce GTX 1050

I don't have a CLI command, I installed it via Community Apps. Hopefully I'm answering that question correctly?

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Feb 1, 2024

you do have cli, the unraid community apps just setup a docker command and you are shown this docker command when you apply changes to the unraid container

@usafle
Copy link
Author

usafle commented Feb 1, 2024

Perhaps you are looking for the template when it pulls down the container from C.A.?

Screenshot 2024-02-01 at 12-26-03 CozsNAS_UpdateContainer
Screenshot 2024-02-01 at 12-26-19 CozsNAS_UpdateContainer

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Feb 1, 2024

no, after you press apply at the bottom it shows you the cli command

@usafle
Copy link
Author

usafle commented Feb 1, 2024

Screenshot 2024-02-01 at 12-30-52 CozsNAS_UpdateContainer

Thanks for the clarification

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Feb 1, 2024

you need to add --gpus=all to the extra arguments list

@usafle
Copy link
Author

usafle commented Feb 1, 2024

S.O.B. That fixed it. She starts now. Question while I have your attention, What do I now do with the

`# detectors:

cpu1:

type: cpu

num_threads: 2`

Will it automatically utilize the GPU now for detectors or, do I have to specfically put a different line of code in he YML?

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Feb 1, 2024

if you want it to use the GPU for object detection then you should follow https://docs.frigate.video/configuration/object_detectors#nvidia-tensorrt-detector

@usafle
Copy link
Author

usafle commented Feb 1, 2024

So I should be paying attention to this specific paragraph?

detectors:
  tensorrt:
    type: tensorrt
    device: 0 #This is the default, select the first GPU

model:
  path: /config/model_cache/tensorrt/yolov7-320.trt
  input_tensor: nchw
  input_pixel_format: rgb
  width: 320
  height: 320

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Feb 1, 2024

Yes

@usafle
Copy link
Author

usafle commented Feb 1, 2024

So now there is some sort of python error: Segmentation Fault.

More than likely it's probably my fault.

2024-02-01 17:45:24.201421738  [2024-02-01 17:45:24] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +67, now: CPU 0, GPU 451 (MiB)
2024-02-01 17:45:26.505945177  [INFO] Starting go2rtc healthcheck service...
2024-02-01 17:45:28.031121208  Fatal Python error: Segmentation fault
2024-02-01 17:45:28.031132500  
2024-02-01 17:45:28.031143201  Thread 0x000014c0c25ee6c0 (most recent call first):
2024-02-01 17:45:28.031181405    File "/usr/lib/python3.9/threading.py", line 312 in wait
2024-02-01 17:45:28.031752450    File "/usr/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
2024-02-01 17:45:28.031834378    File "/usr/lib/python3.9/threading.py", line 892 in run
2024-02-01 17:45:28.031839453    File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
2024-02-01 17:45:28.032128629    File "/usr/lib/python3.9/threading.py", line 912 in _bootstrap
2024-02-01 17:45:28.032139296  
2024-02-01 17:45:28.032142998  Current thread 0x000014c0e90f8740 (most recent call first):
2024-02-01 17:45:28.032147248    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 168 in <listcomp>
2024-02-01 17:45:28.032223130    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 167 in _do_inference
2024-02-01 17:45:28.032229472    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 286 in detect_raw
2024-02-01 17:45:28.032309661    File "/opt/frigate/frigate/object_detection.py", line 75 in detect_raw
2024-02-01 17:45:28.032314928    File "/opt/frigate/frigate/object_detection.py", line 125 in run_detector
2024-02-01 17:45:28.032318383    File "/usr/lib/python3.9/multiprocessing/process.py", line 108 in run
2024-02-01 17:45:28.032321965    File "/usr/lib/python3.9/multiprocessing/process.py", line 315 in _bootstrap
2024-02-01 17:45:28.032325542    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 71 in _launch
2024-02-01 17:45:28.032329065    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 19 in __init__
2024-02-01 17:45:28.032368478    File "/usr/lib/python3.9/multiprocessing/context.py", line 277 in _Popen
2024-02-01 17:45:28.032405583    File "/usr/lib/python3.9/multiprocessing/context.py", line 224 in _Popen
2024-02-01 17:45:28.032452351    File "/usr/lib/python3.9/multiprocessing/process.py", line 121 in start
2024-02-01 17:45:28.032514176    File "/opt/frigate/frigate/object_detection.py", line 183 in start_or_restart
2024-02-01 17:45:28.032610256    File "/opt/frigate/frigate/object_detection.py", line 151 in __init__
2024-02-01 17:45:28.032677035    File "/opt/frigate/frigate/app.py", line 453 in start_detectors
2024-02-01 17:45:28.032756244    File "/opt/frigate/frigate/app.py", line 683 in start
2024-02-01 17:45:28.032838373    File "/opt/frigate/frigate/__main__.py", line 17 in <module>
2024-02-01 17:45:28.032920488    File "/usr/lib/python3.9/runpy.py", line 87 in _run_code
2024-02-01 17:45:28.033003527    File "/usr/lib/python3.9/runpy.py", line 197 in _run_module_as_main
2024-02-01 17:45:42.751911555  [2024-02-01 17:45:42] frigate.watchdog               INFO    : Detection appears to be stuck. Restarting detection process...
2024-02-01 17:45:42.774446711  [2024-02-01 17:45:42] detector.tensorrt              INFO    : Starting detection process: 1257
2024-02-01 17:45:43.696120537  [2024-02-01 17:45:43] frigate.detectors.plugins.tensorrt INFO    : Loaded engine size: 382 MiB
2024-02-01 17:45:44.224290941  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 506, GPU 572 (MiB)
2024-02-01 17:45:44.237778500  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 508, GPU 582 (MiB)
2024-02-01 17:45:44.244529126  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +384, now: CPU 0, GPU 384 (MiB)
2024-02-01 17:45:44.321038782  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 126, GPU 576 (MiB)
2024-02-01 17:45:44.325099439  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 126, GPU 584 (MiB)
2024-02-01 17:45:44.325186438  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +67, now: CPU 0, GPU 451 (MiB)
2024-02-01 17:45:44.330534549  Fatal Python error: Segmentation fault
2024-02-01 17:45:44.330541049  
2024-02-01 17:45:44.330566522  Thread 0x000014c0d9bf96c0 (most recent call first):
2024-02-01 17:45:44.330629019    File "/usr/lib/python3.9/threading.py", line 312 in wait
2024-02-01 17:45:44.330699041    File "/usr/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
2024-02-01 17:45:44.330752018    File "/usr/lib/python3.9/threading.py", line 892 in run
2024-02-01 17:45:44.330826565    File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
2024-02-01 17:45:44.330882447    File "/usr/lib/python3.9/threading.py", line 912 in _bootstrap
2024-02-01 17:45:44.330884860  
2024-02-01 17:45:44.330905318  Current thread 0x000014c0d97f76c0 (most recent call first):
2024-02-01 17:45:44.330974805    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 168 in <listcomp>
2024-02-01 17:45:44.331058941    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 167 in _do_inference
2024-02-01 17:45:44.331145620    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 286 in detect_raw
2024-02-01 17:45:44.331218329    File "/opt/frigate/frigate/object_detection.py", line 75 in detect_raw
2024-02-01 17:45:44.331295305    File "/opt/frigate/frigate/object_detection.py", line 125 in run_detector
2024-02-01 17:45:44.331365046    File "/usr/lib/python3.9/multiprocessing/process.py", line 108 in run
2024-02-01 17:45:44.331445566    File "/usr/lib/python3.9/multiprocessing/process.py", line 315 in _bootstrap
2024-02-01 17:45:44.331529044    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 71 in _launch
2024-02-01 17:45:44.331622515    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 19 in __init__
2024-02-01 17:45:44.331702885    File "/usr/lib/python3.9/multiprocessing/context.py", line 277 in _Popen
2024-02-01 17:45:44.331781844    File "/usr/lib/python3.9/multiprocessing/context.py", line 224 in _Popen
2024-02-01 17:45:44.331849831    File "/usr/lib/python3.9/multiprocessing/process.py", line 121 in start
2024-02-01 17:45:44.331918830    File "/opt/frigate/frigate/object_detection.py", line 183 in start_or_restart
2024-02-01 17:45:44.331989877    File "/opt/frigate/frigate/watchdog.py", line 34 in run
2024-02-01 17:45:44.332064820    File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
2024-02-01 17:45:44.332133000    File "/usr/lib/python3.9/threading.py", line 912 in _bootstrap
2024-02-01 17:45:52.766515491  [2024-02-01 17:45:52] frigate.watchdog               INFO    : Detection appears to have stopped. Exiting Frigate...
2024-02-01 17:45:52.790404169  [INFO] The go2rtc-healthcheck service exited with code 256 (by signal 15)
2024-02-01 17:45:52.849888651  [INFO] Service NGINX exited with code 0 (by signal 0)
2024-02-01 17:45:52.853521917  [2024-02-01 17:45:52] frigate.app                    INFO    : Stopping...
2024-02-01 17:45:52.854274087  [2024-02-01 17:45:52] frigate.ptz.autotrack          INFO    : Exiting autotracker...
2024-02-01 17:45:52.854802363  [2024-02-01 17:45:52] frigate.storage                INFO    : Exiting storage maintainer...
2024-02-01 17:45:52.860287936  [2024-02-01 17:45:52] frigate.stats                  INFO    : Exiting stats emitter...
2024-02-01 17:45:52.860293391  [2024-02-01 17:45:52] frigate.watchdog               INFO    : Exiting watchdog...
2024-02-01 17:45:52.867881182  [2024-02-01 17:45:52] frigate.record.cleanup         INFO    : Exiting recording cleanup...
2024-02-01 17:45:52.868901156  [2024-02-01 17:45:52] frigate.events.cleanup         INFO    : Exiting event cleanup...
2024-02-01 17:45:52.868906792  [2024-02-01 17:45:52] frigate.object_processing      INFO    : Exiting object processor...
2024-02-01 17:45:53.002306542  [2024-02-01 17:45:53] frigate.comms.ws               INFO    : Exiting websocket client...
2024-02-01 17:45:53.779081109  [2024-02-01 17:45:53] frigate.events.maintainer      INFO    : Exiting event processor...
2024-02-01 17:45:53.779503732  [2024-02-01 17:45:53] peewee.sqliteq                 INFO    : writer received shutdown request, exiting.
2024-02-01 17:45:53.783142912  [2024-02-01 17:45:53] frigate.record.maintainer      INFO    : Exiting recording maintenance..

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Feb 1, 2024

might be the driver, 535 is recommended version generally

@usafle
Copy link
Author

usafle commented Feb 1, 2024

...and here I thought keeping everything up to date was the best idea.....

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Feb 1, 2024

I believe 545 is not marked stable yet

@usafle
Copy link
Author

usafle commented Feb 5, 2024

I'm about to downgrade my driver to get this working or, to see if it IS actually the driver causing these issues. There are multiple v535 drivers:

  1. v535.129.03
  2. v535.113.01
  3. v535.104.05
  4. v530.41.03

Do you have a preference in which should be tried?

@NickM-27
Copy link
Sponsor Collaborator

Choosing for #9801

@NickM-27 NickM-27 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants