Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues running model on raspberrypi5 + edgetpu #50

Open
han-so1omon opened this issue May 15, 2024 · 90 comments
Open

Issues running model on raspberrypi5 + edgetpu #50

han-so1omon opened this issue May 15, 2024 · 90 comments

Comments

@han-so1omon
Copy link

I have some issues running the degirum models on my raspberrypi5 + edgetpu environment with raspberrypi os 12 bookworm. Moving from this issue ultralytics/ultralytics#1185. @shashichilappagari Can you provide some assistance?

First step

# Download the models
degirum download-zoo --path /home/errc/v --device EDGETPU --runtime TFLITE --precision QUANT --token dg_4JRLnVvtfNdKLzj4oL816wNtL9gQBT5dfqmi3 --url https://cs.degirum.com/degirum/edgetpu

Try to run with degirum pysdk

import cv2
import degirum as dg

image = cv2.imread("./test-posenet.jpg")

zoo = dg.connect(dg.LOCAL, "/home/errc/v/yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1/yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1.json")
model = zoo.load_model("yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1")
print(model)

result = model.predict(image)
result_image = result.image_overlay
cv2.imwrite("./test-posenet-degirum.jpg", result_image)
# Result
> python pose-tracking-debug-degirum.py 
<degirum.model._ClientModel object at 0x7f73d62ed0>
terminate called without an active exception
terminate called without an active exception
Aborted

Try to run with base yolo ultralytics library

from ultralytics import YOLO

# Load model
model = YOLO('/home/errc/v/yolov8n_full_integer_quant_edgetpu.tflite') 
yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1.tflite') 

# Track with the model
results = model.track(source="/home/errc/e/ai/test-infrared.mp4", save=True)
# Result
>  python pose-tracking-debug-yolo.py 
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'.
Loading /home/errc/v/yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1/yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1.tflite for TensorFlow Lite inference...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Traceback (most recent call last):
  File "/home/errc/e/ai/pose-tracking-debug-yolo.py", line 8, in <module>
    results = model.track(source="/home/errc/e/ai/test-infrared.mp4", save=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/errc/e/ai/venv/lib/python3.11/site-packages/ultralytics/engine/model.py", line 492, in track
    return self.predict(source=source, stream=stream, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/errc/e/ai/venv/lib/python3.11/site-packages/ultralytics/engine/model.py", line 445, in predict
    self.predictor.setup_model(model=self.model, verbose=is_cli)
  File "/home/errc/e/ai/venv/lib/python3.11/site-packages/ultralytics/engine/predictor.py", line 297, in setup_model
    self.model = AutoBackend(
                 ^^^^^^^^^^^^
  File "/home/errc/e/ai/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/errc/e/ai/venv/lib/python3.11/site-packages/ultralytics/nn/autobackend.py", line 341, in __init__
    interpreter.allocate_tensors()  # allocate
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/errc/e/ai/venv/lib/python3.11/site-packages/tflite_runtime/interpreter.py", line 531, in allocate_tensors
    return self._interpreter.AllocateTensors()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Encountered unresolved custom op: edgetpu-custom-op.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom Node number 0 (edgetpu-custom-op) failed to prepare.Encountered unresolved custom op: edgetpu-custom-op.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom Node number 0 (edgetpu-custom-op) failed to prepare.

I can see that the edgetpu is connected. Although I am not sure that it is being used

> lspci
0000:00:00.0 PCI bridge: Broadcom Inc. and subsidiaries Device 2712 (rev 21)
0000:01:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
0001:00:00.0 PCI bridge: Broadcom Inc. and subsidiaries Device 2712 (rev 21)
0001:01:00.0 Ethernet controller: Device 1de4:0001
> ls /dev/apex_0 
/dev/apex_0
@shashichilappagari
Copy link
Contributor

@kteodorovich or @boristeo Can you please help @han-so1omon. I suspect pcie driver is not installed properly.
@han-so1omon can you please provide us the output of the following command

degirum sys-info

@han-so1omon
Copy link
Author

han-so1omon commented May 16, 2024

Sure, here's the output @shashichilappagari @boristeo @kteodorovich

$ degirum sys-info
Devices:
  N2X/CPU:
  - '@Index': 0
  - '@Index': 1
  TFLITE/CPU:
  - '@Index': 0
  - '@Index': 1
  TFLITE/EDGETPU:
  - '@Index': 0
Software Version: 0.12.1

@shashichilappagari
Copy link
Contributor

@han-so1omon So, it appears that PySDK is able to recognize the EdgeTPU. To ensure that driver is properly installed and is working, we made a small test script that does not depend on pysdk (this will allow us to diagnose if problem is with pysdk or the basic setup). Please see if you can run the following code without errors:

import tflite_runtime.interpreter as tflite
from PIL import Image
import numpy as np
import os

print('Downloading test model and test image')
os.system('wget -nc https://raw.githubusercontent.com/google-coral/test_data/master/mobilenet_v1_1.0_224_quant_edgetpu.tflite')
os.system('wget -nc https://github.com/DeGirum/PySDKExamples/blob/main/images/Cat.jpg?raw=true -O Cat.jpg')

print('Running...')
m = tflite.Interpreter('mobilenet_v1_1.0_224_quant_edgetpu.tflite', experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])
img = Image.open('Cat.jpg')

m.allocate_tensors()
n, h, w, c = m.get_input_details()[0]['shape']
m.set_tensor(m.get_input_details()[0]['index'], np.array(img.resize((h, w)))[np.newaxis,...])
m.invoke()
out = m.get_tensor(m.get_output_details()[0]['index']).flatten()

assert np.argmax(out) == 288, 'Wrong output result'
assert out[np.argmax(out)] == 83, 'Wrong output probability'
print('OK')

@han-so1omon
Copy link
Author

han-so1omon commented May 16, 2024

Here is the output:

$ python edge-tpu-debug-degirum.py 
Downloading test model and test image
--2024-05-16 10:45:04--  https://raw.githubusercontent.com/google-coral/test_data/master/mobilenet_v1_1.0_224_quant_edgetpu.tflite
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4749998 (4.5M) [application/octet-stream]
Saving to: ‘mobilenet_v1_1.0_224_quant_edgetpu.tflite’

mobilenet_v1_1.0_224_quant_edgetpu 100%[==============================================================>]   4.53M  13.5MB/s    in 0.3s    

2024-05-16 10:45:05 (13.5 MB/s) - ‘mobilenet_v1_1.0_224_quant_edgetpu.tflite’ saved [4749998/4749998]

--2024-05-16 10:45:05--  https://github.com/DeGirum/PySDKExamples/blob/main/images/Cat.jpg?raw=true
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/DeGirum/PySDKExamples/raw/main/images/Cat.jpg [following]
--2024-05-16 10:45:05--  https://github.com/DeGirum/PySDKExamples/raw/main/images/Cat.jpg
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/Cat.jpg [following]
--2024-05-16 10:45:06--  https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/Cat.jpg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 334467 (327K) [image/jpeg]
Saving to: ‘Cat.jpg’

Cat.jpg                            100%[==============================================================>] 326.63K  --.-KB/s    in 0.1s    

2024-05-16 10:45:06 (3.15 MB/s) - ‘Cat.jpg’ saved [334467/334467]

Running...
OK

@shashichilappagari
Copy link
Contributor

@han-so1omon Thanks for checking this. So, it does appear that edgetpu is properly functioning. Can you please try other models in the zoo to make sure that it is not a problem specific to the model?

@han-so1omon
Copy link
Author

han-so1omon commented May 16, 2024

@shashichilappagari I have tried with another model and it works fine. Using the test image from the project posenet from google, the yolov8n_relu6_coco--640x640_quant_tflite_edgetpu_1 model is able to classify the image

import cv2
import degirum as dg

image = cv2.imread("./test-posenet.jpg")

#zoo = dg.connect(dg.LOCAL, "/home/errc/v/yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1/yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1.json")
zoo = dg.connect(dg.LOCAL, "/home/errc/v/yolov8n_relu6_coco--640x640_quant_tflite_edgetpu_1/yolov8n_relu6_coco--640x640_quant_tflite_edgetpu_1.json")
model = zoo.load_model("yolov8n_relu6_coco--640x640_quant_tflite_edgetpu_1")
print(model)

result = model.predict(image)
result_image = result.image_overlay
cv2.imwrite("./test-posenet-degirum.jpg", result_image)

@shashichilappagari
Copy link
Contributor

@han-so1omon Is there a typo in the above code? Did you load the pose model or the coco detection model? Assuming you loaded the detection model and that it is working, it is partially good news as this shows that pysdk is working with local edgetpu. It now seems that the problem is specific to the model. We will upload models compiled at lower resolution to see if they resolve the issue. Thanks for your patience and quick responses.

@han-so1omon
Copy link
Author

Some more context, I'm using feranick's edgetpu runtime and python libraries, as recommended in the ultralytics setup instructions, as these runtimes are kept up to date since google abandoned the coral project. I'm also using the default python 3.9 on raspberry pi os bookworm. Should I perhaps try running from the docker container? If so, do you have examples of that?

@han-so1omon
Copy link
Author

Thank you, I am trying to get this issue resolved this week, and I appreciate your responsiveness quite a lot

@han-so1omon
Copy link
Author

Yes, I corrected the typo. I loaded the coco detection model, and it seems to work fine. I commented out loading the coco pose model, as it was throwing the error

@shashichilappagari
Copy link
Contributor

@han-so1omon Since other models are working, the issue seems to be specific to the pose model. As you can see from our cloud platform, all models in the edgetpu model zoo are working properly on our cloud farm machines which have the google edge tpu pcie module. As I mentioned before, we will compile pose models at lower resolution and see if the problem goes away. Another option is to use google's mobilenet posenet model.

@han-so1omon
Copy link
Author

han-so1omon commented May 16, 2024

@shashichilappagari Do you have instructions on how you've setup your google edge tpu pcie modules? Additionally, do you have the mobilenet posenet in your model zoos for edgetpu?

@shashichilappagari
Copy link
Contributor

@han-so1omon We will add it and let you know. Please give us a couple of hours to get lower resolution pose models. We will also share our setup guide with you.

@han-so1omon
Copy link
Author

Ok, thank you

@shashichilappagari
Copy link
Contributor

@han-so1omon From the error message you are seeing, there could be some race condition in the code. We are unable to replicate it on our side but we have some ideas to test. Before I explain the ideas, I want to mention that you do not have to download the models to run locally. You can connect to cloud zoo and pysdk will automatically download the models. This will make your code simpler. Once you have finished debugging, you can of course switch to local zoo in case you want offline deployment. Your code should look like below:

import cv2
import degirum as dg

image = cv2.imread("./test-posenet.jpg")

zoo = dg.connect(dg.LOCAL, "https://cs.degirum.com/degirum/edgetpu", <your token>)
model = zoo.load_model("yolov8n_relu6_coco--640x640_quant_tflite_edgetpu_1")
print(model)

result = model.predict(image)
result_image = result.image_overlay
cv2.imwrite("./test-posenet-degirum.jpg", result_image)

With the above code you can just change model name every time you want to experiment with a different model.

Now to rule out the race condition that could be killing your python, we can try the following. PySDK supports three types of inference: cloud, ai_server, and local. We can try ai_server. In a terminal window, activate the python environment in which you installed pysdk. Then type

degirum server

You will see a message saying that degirum server started.

Then run the following code:

import cv2
import degirum as dg

image = cv2.imread("./test-posenet.jpg")

zoo = dg.connect('localhost', "https://cs.degirum.com/degirum/edgetpu", <your token>)
model = zoo.load_model("yolov8n_relu6_coco--640x640_quant_tflite_edgetpu_1")
print(model)

result = model.predict(image)
result_image = result.image_overlay
cv2.imwrite("./test-posenet-degirum.jpg", result_image)

Note that we changed dg.Local to localhost to switch to ai_server.

Please try this code and see if it works. We also added 512x512 pose model and 320x320 pose model. You can try those models also. We are in the process of adding mobilenet_posenet to the zoo and will let you know once it is added.

Hope that this helps.

@han-so1omon
Copy link
Author

Ok. What do you think the race condition is, and is there a way to perform a wait to prevent it?

@shashichilappagari
Copy link
Contributor

@han-so1omon At this point we are not sure as it could be system dependent. That is why we want you to try the localhost option. There will not be any performance impact on using this option. If localhost option works, we will at least know that the problem is localized to local inference case and we will investigate further. But if localhost also does not work on your side, we have to think of other ways to debug.

@shashichilappagari
Copy link
Contributor

@han-so1omon We also added the mobilenet_v1_posenet model to the edetpu model zoo. Please see if it works on your side.

@han-so1omon
Copy link
Author

@shashichilappagari I will try all later in the day. Can you share how you've setup the edgetpu modules?

@shashichilappagari
Copy link
Contributor

@kteodorovich can you share our user guide for edge tpu with @han-so1omon?

@kteodorovich
Copy link
Contributor

@han-so1omon Hello! Our guide for USB Edge TPU is available here. You might be past all these steps already, given that you got the detection model to work.

By the way, the base Ultralytics library will only recognize a model for Edge TPU if the filename ends with _edgetpu.tflite. Also, our export modifies the structure of the model in a way that is incompatible with the Ultralytics built-in postprocessor.

@han-so1omon
Copy link
Author

@shashichilappagari @kteodorovich Thank you! It looks like it was indeed a race condition with the local degirum setup. All of the models appear to work correctly with the localhost-based ai server as recommended. Do you have recommendations on how to setup a pose tracking algorithm on top of the pose prediction algorithm?

@shashichilappagari
Copy link
Contributor

@han-so1omon We are glad to hear that localhost option is working with the models you need. Just to give you some background information: the pose models have a python based postprocessor and it could be causing some issues with the python interpreter running the model. In case of localhost the python interpreter running the inference and postprocessor are two separate instances and hence do not have issues. We are currently investigating how to fix the issue for dg.LOCAL use case and will let you know when we release a pysdk version that fixes the issue. Until then you can use localhost as it does not have any real performance impact.

@shashichilappagari
Copy link
Contributor

@han-so1omon Here is an example of how you can add tracking on top of a model: https://github.com/DeGirum/PySDKExamples/blob/main/examples/specialized/multi_object_tracking_video_file.ipynb

@shashichilappagari
Copy link
Contributor

@vlad-nn Python post-processor indeed seems to have a race condition when using dg.LOCAL option as confirmed by @han-so1omon

@han-so1omon
Copy link
Author

han-so1omon commented May 17, 2024

@shashichilappagari Does your pose algorithm from yolo_pose support landmarks?

@shashichilappagari
Copy link
Contributor

@han-so1omon Do you mean if the tracking algorithm tracks landmarks?

@han-so1omon
Copy link
Author

@shashichilappagari basically is there a way to present it as a skeleton with each part of the body denoted. Like 'right ear', 'right forearm', etc

@shashichilappagari
Copy link
Contributor

@han-so1omon so you want the output of prediction to have a label for each keypoint?

@han-so1omon
Copy link
Author

Basically, yes. I would like to know what part of the body the keypoint comes from

@vlad-nn
Copy link
Contributor

vlad-nn commented May 31, 2024

top 40 or however many body waypoints

It selects top_k objects, not keypoints. You see, you need object tracking for only one purpose: how to match a person detected on one frame with the same person, detected on another frame, knowing only person bboxes. This is the task for object tracker. Then, in some cases, you need to track one (or few) particular people from frame to frame. This is the task for object selector.

how does FIRFilterLP work?

Regarding FIRLilterFP. It is simple FIR lopass filter, which can be used for many purposes, but in that example it is used to suppress high-frequency noise in bbox coordinates (smooth them in time). When you create filter object, you specify normalized cutoff frequency (as a fraction of sampling rate, so 0.5 is Nyquist frequency), number of taps aka filter length, and dimension of the input signal (size of the input vector): lpf = degirum_tools.FIRFilterLP(normalized_cutoff=0.1, taps_cnt=7, dimension=2). Then you apply the filter just by "calling" it: xf, yf = lpf([x,y])

@han-so1omon
Copy link
Author

@vlad-nn Should it not be that object tracking is frame-to-frame for the prescribed object, and object selection is grouping of multiple objects, perhaps for more semantic analysis?

Got it, so I will possibly try to apply the low-pass filter on a selection of landmarks that semantically define limb and body region movement in projected 2d space

@vlad-nn
Copy link
Contributor

vlad-nn commented May 31, 2024

Should it not be that object tracking is frame-to-frame for the prescribed object, and object selection is grouping of multiple objects, perhaps for more semantic analysis?

Yes, like in that example above, when we first select the largest hand on the screen and then track only that hand ignoring all other hands.

@han-so1omon
Copy link
Author

han-so1omon commented Jun 1, 2024

@vlad-nn Seems like it is running ok when the person is in the first frame, but otherwise it is having issues detecting a person. Can run about 10 fps. Full script:

import os
import threading
import multiprocessing
import subprocess
import time
import cv2
import degirum as dg
import degirum_tools

def run_degirum_server():
    # Start the degirum server
    subprocess.run(["degirum", "server"])

def run_prediction_code():
    print('Connecting to Degirum...')
    zoo = dg.connect('localhost', "https://cs.degirum.com/degirum/edgetpu", os.getenv('DEGIRUM_PRIVATE_KEY'))
    #model = zoo.load_model("yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1")
    model = zoo.load_model("yolov8n_silu_coco_pose--512x512_quant_tflite_edgetpu_1")

    print('Creating object tracker...')
    # create object tracker
    tracker = degirum_tools.ObjectTracker(
        track_thresh=0.35,
        track_buffer=100,
        match_thresh=0.9999,
        trail_depth=20,
        anchor_point=degirum_tools.AnchorPoint.BOTTOM_CENTER,
    )

    # create object selector to track only one hand
    selector = degirum_tools.ObjectSelector(
        top_k=1,
        use_tracking=True,
        selection_strategy=degirum_tools.ObjectSelectionStrategies.LARGEST_AREA,
    )

    # Attaching analyzers
    degirum_tools.attach_analyzers(model, [tracker, selector])

    print('Predicting stream...')
    next_res = degirum_tools.predict_stream(
        model, 'rtsp://0.0.0.0:8554/stream1', analyzers=[tracker, selector], fps=11,
    )

    '''
    next_res = degirum_tools.predict_stream(
        model, 0, analyzers=[tracker], fps=15,
    )
    '''

    start_time = time.time()
    loop_count = 0

    try:
        while True:
            res = next(next_res)
            print(res.results)
            loop_count += 1
            elapsed_time = time.time() - start_time
            loops_per_second = loop_count / elapsed_time
            if loop_count % 10 == 0:
                print(f'Loops per second: {loops_per_second:.2f}')

    except KeyboardInterrupt:
        print('Stopping prediction stream...')

if __name__ == "__main__":
    # Create and start the thread for the Degirum server
    server_thread = threading.Thread(target=run_degirum_server)
    server_thread.start()
    time.sleep(3)
    
    # Create and start the process for the prediction code
    prediction_process = multiprocessing.Process(target=run_prediction_code)
    prediction_process.start()

    try:
        # Wait for the process to complete
        prediction_process.join()
    except KeyboardInterrupt:
        # Handle keyboard interrupt
        prediction_process.terminate()
        prediction_process.join()
        print('Processes terminated.')
    
    # Ensure the server thread is also stopped
    if server_thread.is_alive():
        print('Stopping Degirum server...')
        # This assumes the server can be stopped by interrupting the subprocess
        # If not, additional logic might be needed to stop the server gracefully

@shashichilappagari
Copy link
Contributor

shashichilappagari commented Jun 1, 2024

Seems like it is running ok when the person is in the first frame, but otherwise it is having issues detecting a person.

@han-so1omon Can you clarify what this means? Does it mean that if the video starts with no person in frame, it has trouble detecting a person when they enter the frame in subsequent frames? Any sample video you can share on which we can run and reproduce the issue?

@han-so1omon
Copy link
Author

@shashichilappagari Of course. It means in my case that the results array from the script in the previous comment is empty when a person does not start in frame. When they start in frame, it looks pretty good

No person starting in frame:

res = []

Person starting in frame:

res = [{'bbox': [152.6190751346088, 170.64564324386376, 580.8882319904127, 476.01401956566485], 'category_id': 0, 'label': 'person', 'landmarks': [{'category_id': 0, 'connect': [], 'label': 'Nose', 'landmark': [337.8222351074219, 282.5333251953125], 'score': 0.9959291815757751}, {'category_id': 1, 'connect': [], 'label': 'LeftEye', 'landmark': [346.73333740234375, 251.34442138671878], 'score': 0.9885194897651672}, {'category_id': 2, 'connect': [], 'label': 'RightEye', 'landmark': [297.72222900390625, 264.7110900878906], 'score': 0.9982303977012634}, {'category_id': 3, 'connect': [], 'label': 'leftEar', 'landmark': [364.5555725097656, 282.5333251953125], 'score': 0.48260247707366943}, {'category_id': 4, 'connect': [], 'label': 'RightEar', 'landmark': [235.34442138671875, 318.1777648925781], 'score': 0.9772321581840515}, {'category_id': 5, 'connect': [], 'label': 'leftShoulder', 'landmark': [462.57781982421875, 425.11114501953125], 'score': 0.8321882486343384}, {'category_id': 6, 'connect': [], 'label': 'RightShoulder', 'landmark': [226.4333038330078, 447.388916015625], 'score': 0.8009689450263977}, {'category_id': 7, 'connect': [], 'label': 'LeftElbow', 'landmark': [556.14453125, 480], 'score': 0.03194461390376091}, {'category_id': 8, 'connect': [], 'label': 'RightElbow', 'landmark': [204.155517578125, 480], 'score': 0.04177384078502655}, {'category_id': 9, 'connect': [], 'label': 'LeftWrist', 'landmark': [498.2222900390625, 442.933349609375], 'score': 0.05445735901594162}, {'category_id': 10, 'connect': [], 'label': 'RightWrist', 'landmark': [244.25552368164062, 447.388916015625], 'score': 0.09133881330490112}, {'category_id': 11, 'connect': [], 'label': 'LeftHip', 'landmark': [489.3111572265625, 480], 'score': 0.0025045101065188646}, {'category_id': 12, 'connect': [], 'label': 'RightHip', 'landmark': [346.73333740234375, 480], 'score': 0.0026845973916351795}, {'category_id': 13, 'connect': [], 'label': 'LeftKnee', 'landmark': [475.9444885253906, 353.8222351074219], 'score': 0.002179690171033144}, {'category_id': 14, 'connect': [], 'label': 'RightKnee', 'landmark': [346.73333740234375, 358.27777099609375], 'score': 0.0025045101065188646}, {'category_id': 15, 'connect': [], 'label': 'LeftAnkle', 'landmark': [418.0222473144531, 385.0111083984375], 'score': 0.0026845973916351795}, {'category_id': 16, 'connect': [], 'label': 'RightAnkle', 'landmark': [351.18890380859375, 380.5555725097656], 'score': 0.0026845973916351795}], 'score': 0.8749055862426758, 'track_id': 1}]

@shashichilappagari
Copy link
Contributor

@han-so1omon we will take a look to see if this has anything to do with tracking. So the symptom you are seeing is that if the first frame does not have a person, the predict stream is not detecting a person even in all subsequent frames which actually have a person. Is that a correct assessment? Can you disable tracking the tracking to see if the behavior is different?

@han-so1omon
Copy link
Author

It appears that the combination of the tracker and the selector is the issue. The tracker can yield landmark results when people are walking in and out with a delay (~2s). The selector can only yield landmark results when the person stays in the frame. If a person is not in the first frame with the selector active, as it seems then the results are empty

@shashichilappagari
Copy link
Contributor

@han-so1omon Thanks for this additional information. We will take a look and keep you posted.

@han-so1omon
Copy link
Author

For results from predict_stream, how to get the timestamp of a result? I see i can get res.results

@shashichilappagari
Copy link
Contributor

shashichilappagari commented Jun 2, 2024

@han-so1omon This is an interesting question. Does the original stream provide timestamp info? In PySDK, we support attaching frame_info to every frame that is just passed through so that you can keep track: see https://docs.degirum.com/documentation/PySDK-0.12.2/api-ref/model/?h=frame#degirum.model.Model.predict
However, in this case you need to write your own video source that yields such info for every frame along with frame data. Please let us know if that serves your purpose.

@han-so1omon
Copy link
Author

han-so1omon commented Jun 2, 2024

@shashichilappagari I do not see a parameter called frame_info in the degirum.model.Model.predict object. I am using the rtsp image stream as before rpicam-vid -t 0 --codec libav --libav-format mpegts --libav-audio --inline --framerate 15 -o - | cvlc - --sout '#rtp{sdp=rtsp://0.0.0.0:8554/stream1}' :demux=ts`

Edit. Will something like this work to create timestamps?

rpicam-vid -t 0 --codec libav --libav-format mpegts --libav-audio --inline --framerate 8 -o - | ffmpeg -i - -vf "drawtext=fontfile=/path/to/font.ttf:fontsize=20:fontcolor=white:x=10:y=10:text='%{localtime\:%Y-%m-%d %H\\\\\:%M\\\\\:%S}',format=yuv420p" -f mpegts - | cvlc - --sout '#rtp{sdp=rtsp://0.0.0.0:8554/stream1}' :demux=ts

@vlad-nn
Copy link
Contributor

vlad-nn commented Jun 3, 2024

@han-so1omon, the problem in your code is that you are attaching analyzers twice:

  1. Once by calling attach_analyzers
  2. Second time when passing them to predict_stream

Please remove either one of them, and the code will work correctly.

@vlad-nn
Copy link
Contributor

vlad-nn commented Jun 3, 2024

I do not see a parameter called frame_info in the degirum.model.Model.predict object

@han-so1omon , predict_stream() does not have this functionality - it is too high-level function. In order to pass frame info, one needs to use lower-level model.predict_batch() API. That model-level method accepts an iterator, which produces input data frames. If such iterator yields just images (either file path strings, or numpy arrays, or PIL objects) then model.predict_batch() just uses them as is. But if that iterator yields a two-element tuple, it considers the first element of that tuple as frame data and second element as frame info. Then such frame info will be eventually assigned to the corresponding result object as returned by model.predict_batch()

Example:

import degirum as dg, time, numpy as np

zoo = dg.connect("@cloud", "https://cs.degirum.com/degirum/public", <your token here>)
model = zoo.load_model("mobilenet_v2_imagenet--224x224_quant_n2x_cpu_1")

# define frame data generator
def source():
    for i in range(10):
        # frame info can be any object
        frame_info = {"iteration": i, "timestamp": time.ctime()}

        # yield tuple with image and frame info
        yield (np.zeros((10,10,3)), frame_info)

for result in model.predict_batch(source()):
    # result.info contains frame info, passed as second element of frame tuple
    print(result.info)

@han-so1omon
Copy link
Author

@vlad-nn Got it, thank you I will try. Separate question, how do I get access to https://cs.degirum.com/degirum/raspberry_pi_models model zoo?

@shashichilappagari
Copy link
Contributor

@han-so1omon you should now be able to access raspberry pi models. Please note that these are models for CPU.

@han-so1omon
Copy link
Author

@shashichilappagari I am aware. This is for an art piece, but as of saturday it is already in the gallery. I am fixing some things, though, so I am testing on the raspberry pi cpu

@shashichilappagari
Copy link
Contributor

@han-so1omon sounds like an interesting application. Please let us know if there is something online on which we can see your application.

@han-so1omon
Copy link
Author

@shashichilappagari Will share once it's fixed up

@han-so1omon
Copy link
Author

han-so1omon commented Jun 7, 2024

@vlad-nn I may have it working with near real-time results at 12 fps using Picamera2

def video_source(fps = None):
    picam2 = Picamera2()
    picam2.configure()
    picam2.start()
    print('Establishing video source')
    if fps:
        # Throttle if camera feed
        minimum_elapsed_time = 1.0 / fps
        prev_time = time.time()

    i = 0
    while True:
        frame = picam2.capture_array()
        frame = frame[:,:,:3]
        #frame = np.zeros((640,480,3))
        i += 1
        frame_info = {"iteration": i, "timestamp": time.ctime()}
        if fps:
            curr_time = time.time()
            elapsed_time = curr_time - prev_time

            if elapsed_time < minimum_elapsed_time:
                continue

            prev_time = curr_time - (elapsed_time - minimum_elapsed_time)

            yield (frame, frame_info)
        else:
            yield (frame, frame_info)

def predict_stream(
    model: dg.model.Model,
    *,
    fps = None,
    analyzers = None,
):
    print('In stream prediction function')
    analyzing_postprocessor = degirum_tools.inference_support._create_analyzing_postprocessor_class(analyzers)

    for res in model.predict_batch(video_source(fps=fps)):
        if analyzers is not None:
            yield analyzing_postprocessor(result=res)
        else:
            yield res

def run_prediction_code():
    print('Connecting to Degirum...')

    #zoo = dg.connect('localhost', "https://cs.degirum.com/degirum/raspberry_pi_models", os.getenv('DEGIRUM_PRIVATE_KEY'))
    #model = zoo.load_model("yolov8n_silu_coco_pose--640x640_quant_tflite_cpu_1")

    zoo = dg.connect('localhost', "https://cs.degirum.com/degirum/edgetpu", os.getenv('DEGIRUM_PRIVATE_KEY'))
    model = zoo.load_model("yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1")
    #model = zoo.load_model("yolov8n_silu_coco_pose--512x512_quant_tflite_edgetpu_1")

    print('Creating object tracker...')
    # create object tracker
    tracker = degirum_tools.ObjectTracker(
        track_thresh=0.35,
        track_buffer=100,
        match_thresh=0.9999,
        trail_depth=20,
        anchor_point=degirum_tools.AnchorPoint.BOTTOM_CENTER,
    )

    # create object selector to track only one hand
    selector = degirum_tools.ObjectSelector(
        top_k=1,
        use_tracking=True,
        selection_strategy=degirum_tools.ObjectSelectionStrategies.LARGEST_AREA,
    )

    # Attaching analyzers
    analyzers = [tracker]
    #analyzers = [tracker, selector]
    #degirum_tools.attach_analyzers(model, analyzers)

    print('Predicting stream...')
    next_res = predict_stream(
        model, fps=8, analyzers=analyzers
    )

    '''
    next_res = degirum_tools.predict_stream(
        model, 0, analyzers=[tracker], fps=15,
    )
    '''

    start_time = time.time()
    loop_count = 0

    fake_timestamp = 0.125
    increment_fake_time = lambda t: t + 0.125
    try:
        while True:
            res = next(next_res)
            #TODO is this timestamp correct?
            timestamp = time.time()
            print(res)
            if res.results:
                print(f'Prediction results: {res.results}')
                for r in res.results:
                    fake_timestamp = increment_fake_time(fake_timestamp)
                    add_analysis(r, fake_timestamp)
            loop_count += 1
            elapsed_time = time.time() - start_time
            loops_per_second = loop_count / elapsed_time
            if loop_count % 10 == 0:
                print(f'Loops per second: {loops_per_second:.2f}')

    except KeyboardInterrupt:
        print('Stopping prediction stream...')

if __name__ == "__main__":
    '''
    # Create and start the thread for the camera processes
    camera_server_thread = threading.Thread(target=run_camera_stream)
    camera_server_thread.start()
    time.sleep(4)

    # Create and start the thread for the Degirum server
    degirum_server_thread = threading.Thread(target=run_degirum_server)
    degirum_server_thread.start()
    time.sleep(3)
    '''
    _camera = Picamera2()
    _camera.configure()
    _camera.start()
    
    # Create and start the process for the prediction code
    prediction_process = multiprocessing.Process(target=run_prediction_code)
    prediction_process.start()

@han-so1omon
Copy link
Author

I am still having the issue sometimes where the inference server hangs and then exits. I guess this is a result of the buffer building up in the server and crashing (#50 (comment)). Any way to clear the buffer? I only want the most recent frame anyway

@shashichilappagari
Copy link
Contributor

@han-so1omon The inference server hang: does it happen after the application runs for a while? or is the time to hang random? From the code you shared, it appears that the camera captures the latest frame.

@han-so1omon
Copy link
Author

@shashichilappagari It hangs usually after like 8 seconds if it will happen. Sometimes it happens much later as well. Yes, I'm confused about it as well. Here's the timeout error I get:

INFO:degirum._zoo_accessor:sending a request to https://cs.degirum.com/zoo/v1/public/models/degirum/edgetpu/yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1/dictionary
Process Process-1:
degirum.exceptions.DegirumException: [ERROR]Timeout detected
Timeout 10000 ms waiting for response from AI server '127.0.0.1:8778
dg_client_asio.cpp: 351 [string&)::<lambda]


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/errc/p/coin-light-show-monorepo/witch/src/witch/__main__.py", line 270, in run_prediction_code
    res = next(next_res)
          ^^^^^^^^^^^^^^
  File "/home/errc/p/coin-light-show-monorepo/witch/src/witch/__main__.py", line 431, in predict_stream
    for res in model.predict_batch(video_source(fps=fps)):
  File "/home/errc/.local/lib/python3.11/site-packages/degirum/model.py", line 289, in predict_batch
    for res in self._predict_impl(source):
  File "/home/errc/.local/lib/python3.11/site-packages/degirum/model.py", line 1234, in _predict_impl
    raise DegirumException(
degirum.exceptions.DegirumException: Failed to perform model 'degirum/edgetpu/yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1' inference: [ERROR]Timeout detected
Timeout 10000 ms waiting for response from AI server '127.0.0.1:8778
dg_client_asio.cpp: 351 [string&)::<lambda]

@shashichilappagari
Copy link
Contributor

@kteodorovich Can you see if we can replicate this behavior on our side?

@han-so1omon
Copy link
Author

@shashichilappagari Here is my service definition:

[Unit]
Description=DeGirum AI Service

[Service]
WorkingDirectory=/home/me/
ExecStart=/usr/bin/python -m degirum.server --zoo /home/me/v/degirum
Restart=always
Environment=PATH=/usr/bin:/usr/local/bin:/home/me/.local/bin
Environment="PYTHONPATH=/home/me/.local/lib/python3.11/site-packages"
RestartSec=10
SyslogIdentifier=degirum-ai-server
User=me

[Install]
WantedBy=multi-user.target

@han-so1omon
Copy link
Author

han-so1omon commented Jun 7, 2024

For reference, fps=4 is close to stable for me (~10% chance of error versus 90%). Maybe a race condition or a cache filling up

@shashichilappagari
Copy link
Contributor

@han-so1omon we will see if we can replicate this on our side and keep you posted. FYI, pysdk itself has been tested on webcamera streams running for days at a stretch. So, the most likely scenario is something specific to picamera setup that we are unaware of.

@han-so1omon
Copy link
Author

@shashichilappagari That seems to make some sense, although why would it yield a timeout error from Degirum server? Is it receiving some array or other message that it is not expecting?

@kteodorovich
Copy link
Contributor

@han-so1omon I haven't been able to replicate the same timeout error you're getting, do you think there's anything in your setup that could contribute to extra delays?

I modified your code snippet to show a video preview (instead of printing), and this seems to be running smoothly at ~15fps. Could you try running this and let us know if the server still crashes?

import degirum as dg
import degirum_tools
from picamera2 import Picamera2
import time

def picam2_stream(fps=None):
    with Picamera2() as cam:
        config = cam.create_preview_configuration(main={"format": "RGB888"}) # capture frames in RGB
        cam.configure(config)
        cam.start()
        delay = 0 if fps is None else 1 / fps
        last_time = time.time()
        i = 0
    
        while True:
            t = time.time()
            if fps is None or t - last_time >= delay:
                last_time = t
                frame = cam.capture_array()
                frame_info = {"iteration": i, "timestamp": time.ctime()}
                i += 1
                yield (frame, frame_info)

if __name__ == '__main__':
    # create object tracker
    tracker = degirum_tools.ObjectTracker(
        track_thresh=0.35,
        track_buffer=100,
        match_thresh=0.9999,
        trail_depth=20,
        anchor_point=degirum_tools.AnchorPoint.BOTTOM_CENTER,
    )
    
    # create object selector to track only one person
    selector = degirum_tools.ObjectSelector(
        top_k=1,
        use_tracking=True,
        selection_strategy=degirum_tools.ObjectSelectionStrategies.LARGEST_AREA,
    )
    
    zoo = dg.connect('localhost', "https://cs.degirum.com/degirum/edgetpu", degirum_tools.get_token())
    model = zoo.load_model("yolov8n_relu6_coco_pose--640x640_quant_tflite_edgetpu_1")
    
    model = degirum_tools.attach_analyzers(model, [tracker, selector])
    
    # stream with tracker
    with degirum_tools.Display("Video") as display:
        for res in model.predict_batch(picam2_stream()):
            display.show(res.image_overlay)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants