Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEW - YOLOv8 🚀 Pose Models #1915

Open
glenn-jocher opened this issue Apr 9, 2023 · 91 comments
Open

NEW - YOLOv8 🚀 Pose Models #1915

glenn-jocher opened this issue Apr 9, 2023 · 91 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 9, 2023

YOLOv8 Pose Models

image

Pose estimation is a task that involves identifying the location of specific points in an image, usually referred
to as keypoints. The keypoints can represent various parts of the object such as joints, landmarks, or other distinctive
features. The locations of the keypoints are usually represented as a set of 2D [x, y] or 3D [x, y, visible]
coordinates.

The output of a pose estimation model is a set of points that represent the keypoints on an object in the image, usually
along with the confidence scores for each point. Pose estimation is a good choice when you need to identify specific
parts of an object in a scene, and their location in relation to each other.

Pro Tip: YOLOv8 pose models use the -pose suffix, i.e. yolov8n-pose.pt. These models are trained on the COCO keypoints dataset and are suitable for a variety of pose estimation tasks.

Models

YOLOv8 pretrained Pose models are shown here. Detect, Segment and Pose models are pretrained on the COCO dataset, while Classify models are pretrained on the ImageNet dataset.

Models download automatically from the latest Ultralytics release on first use.

Model size
(pixels)
mAPpose
50-95
mAPpose
50
Speed
CPU ONNX
(ms)
Speed
A100 TensorRT
(ms)
params
(M)
FLOPs
(B)
YOLOv8n-pose 640 49.7 79.7 131.8 1.18 3.3 9.2
YOLOv8s-pose 640 59.2 85.8 233.2 1.42 11.6 30.2
YOLOv8m-pose 640 63.6 88.8 456.3 2.00 26.4 81.0
YOLOv8l-pose 640 67.0 89.9 784.5 2.59 44.4 168.6
YOLOv8x-pose 640 68.9 90.4 1607.1 3.73 69.4 263.2
YOLOv8x-pose-p6 1280 71.5 91.3 4088.7 10.04 99.1 1066.4
  • mAPval values are for single-model single-scale on COCO Keypoints val2017
    dataset. Reproduce by yolo val pose data=coco-pose.yaml device=0
  • Speed averaged over COCO val images using an Amazon EC2 P4d
    instance. Reproduce by yolo val pose data=coco8-pose.yaml batch=1 device=0|cpu

Train

Train a YOLOv8-pose model on the COCO128-pose dataset.

Python

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n-pose.yaml')  # build a new model from YAML
model = YOLO('yolov8n-pose.pt')  # load a pretrained model (recommended for training)
model = YOLO('yolov8n-pose.yaml').load('yolov8n-pose.pt')  # build from YAML and transfer weights

# Train the model
model.train(data='coco8-pose.yaml', epochs=100, imgsz=640)

CLI

# Build a new model from YAML and start training from scratch
yolo pose train data=coco8-pose.yaml model=yolov8n-pose.yaml epochs=100 imgsz=640

# Start training from a pretrained *.pt model
yolo pose train data=coco8-pose.yaml model=yolov8n-pose.pt epochs=100 imgsz=640

# Build a new model from YAML, transfer pretrained weights to it and start training
yolo pose train data=coco8-pose.yaml model=yolov8n-pose.yaml pretrained=yolov8n-pose.pt epochs=100 imgsz=640

Val

Validate trained YOLOv8n-pose model accuracy on the COCO128-pose dataset. No argument need to passed as the model retains it's training data and arguments as model attributes.

Python

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n-pose.pt')  # load an official model
model = YOLO('path/to/best.pt')  # load a custom model

# Validate the model
metrics = model.val()  # no arguments needed, dataset and settings remembered
metrics.box.map    # map50-95
metrics.box.map50  # map50
metrics.box.map75  # map75
metrics.box.maps   # a list contains map50-95 of each category

CLI

yolo pose val model=yolov8n-pose.pt  # val official model
yolo pose val model=path/to/best.pt  # val custom model

Predict

Use a trained YOLOv8n-pose model to run predictions on images.

Python

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n-pose.pt')  # load an official model
model = YOLO('path/to/best.pt')  # load a custom model

# Predict with the model
results = model('https://ultralytics.com/images/bus.jpg')  # predict on an image

CLI

yolo pose predict model=yolov8n-pose.pt source='https://ultralytics.com/images/bus.jpg'  # predict with official model
yolo pose predict model=path/to/best.pt source='https://ultralytics.com/images/bus.jpg'  # predict with custom model

See full predict mode details in the Predict page.

Export

Export a YOLOv8n Pose model to a different format like ONNX, CoreML, etc.

Python

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n-pose.pt')  # load an official model
model = YOLO('path/to/best.pt')  # load a custom trained

# Export the model
model.export(format='onnx')

CLI

yolo export model=yolov8n-pose.pt format=onnx  # export official model
yolo export model=path/to/best.pt format=onnx  # export custom trained model

Available YOLOv8-pose export formats are in the table below. You can predict or validate directly on exported models,ni.e. yolo predict model=yolov8n-pose.onnx. Usage examples are shown for your model after export completes.

Format format Argument Model Metadata
PyTorch - yolov8n-pose.pt
TorchScript torchscript yolov8n-pose.torchscript
ONNX onnx yolov8n-pose.onnx
OpenVINO openvino yolov8n-pose_openvino_model/
TensorRT engine yolov8n-pose.engine
CoreML coreml yolov8n-pose.mlmodel
TF SavedModel saved_model yolov8n-pose_saved_model/
TF GraphDef pb yolov8n-pose.pb
TF Lite tflite yolov8n-pose.tflite
TF Edge TPU edgetpu yolov8n-pose_edgetpu.tflite
TF.js tfjs yolov8n-pose_web_model/
PaddlePaddle paddle yolov8n-pose_paddle_model/

See full export details in the Export page.

@glenn-jocher glenn-jocher pinned this issue Apr 9, 2023
@glenn-jocher glenn-jocher added the documentation Improvements or additions to documentation label Apr 9, 2023
@lyhue1991
Copy link

great job, congratulations!

@glenn-jocher
Copy link
Member Author

@lyhue1991 thank you! It was a big team effort between @AyushExel, @Laughing-q and myself :)

@aiakash
Copy link

aiakash commented Apr 13, 2023

@glenn-jocher 1)can we train model on custom poses?
2) like wan to do it for yoga poses? how shoul i create or pass data to this model, i hve only images of poses,should i need annotations files for same?

@glenn-jocher
Copy link
Member Author

@aiakash you can run coco8-pose.yaml to see a small demo dataset with pose labels:

yolo train model=yolov8n-pose.pt data=coco8-pose.yaml

@albertofernandezvillan
Copy link

If I train a model let's say for detecting faces in images (object detector) and then, I train a pose estimation model with "rotated bounding boxes" of the faces (4-courner detector for example to simplify):

  • is it possible to compute the same validation metrics to see if the performance of the trained model when predicting rotated faces is better?
  • are mAPval 50-95 and mAPpose 50-95 comparable?
  • is it possible to compute mAPpose 50-95 for the predictions for the object detector?
  • is it possible to compute mAPval 50-95 for the predictions for the pose detector?

@glenn-jocher
Copy link
Member Author

@albertofernandezvillan yes, it is possible to compute the same validation metrics for the trained model when predicting rotated faces. By using the rotated bounding boxes, you can evaluate the model's performance using meaningful metrics such as IoU (Intersection over Union) to check how well the predicted boxes overlap with the ground truth boxes.

mAPval 50-95 and mAPpose 50-95 may not be directly comparable because they measure different aspects of object detection and pose estimation respectively. However, they can be used as a rough indication of the model's performance on different evaluation criteria. The mAPval 50-95 measures the detection performance of the object detector with regards to the IoU threshold of the bounding box, while mAPpose 50-95 evaluates the keypoint prediction accuracy with a similar criteria.

It is possible to compute mAPpose 50-95 for the prediction of the object detector if it provides the results of the bounding box along with the predicted keypoints. In such cases, you can use the same evaluation technique for pose estimation as you would for a dedicated pose estimation model.

Similarly, it is possible to compute mAPval 50-95 for the predictions of the pose detector by considering the predicted keypoints as the center points for evaluating the detected rotated bounding boxes. However, it is essential to keep in mind the performance metrics of object detection models and pose detection models can vary depending on the datasets, models, and evaluation methodologies used.

@zhupengjia
Copy link

Hi I tried to convert model to edgetpu using command:
yolo export model=yolov8n-pose.pt format=edgetpu int8=True
I got error of :

Edge TPU: running 'edgetpu_compiler -s -d -k 10 --out_dir yolov8n-pose_saved_model yolov8n-pose_saved_model/yolov8n-pose_full_integer_quant.tflite' Edge TPU Compiler version 16.0.384591198 Searching for valid delegate with step 10 Try to compile segment with 309 ops ERROR: Didn't find op for builtin opcode 'PLACEHOLDER_FOR_GREATER_OP_CODES' version '3'. An older version of this builtin might be supported. Are you using an old TFLite binary with a newer model?

I got no error while converting model yolov8n.pt

I ran the command inside docker ultralytics/ultralytics:latest-python

yolo version: 8.0.77
tensorflow version: 2.1.11
Edge TPU Compiler version: 16.0.384591198
python version: 3.10.11

Thanks!

@Laughing-q
Copy link
Member

@glenn-jocher do we support edgetpu/int8 for pose models?

@Danzip
Copy link

Danzip commented May 10, 2023

Hi Can you provide a sketch of the model?
I'm curious about the implementation details is there a way for me to get access to something like this?
I'm asking specifically about pose
Thank you!

@glenn-jocher
Copy link
Member Author

@Danzip Thank you for your interest in YOLOv8! Our pose model implementation is based on a modified version of the YOLOv4 architecture with modifications to support pose estimation. Specifically, we predict 17 keypoints for each detected object using Spatial Pyramid Pooling (SPP) blocks at the end of the network.

Unfortunately, we do not have a detailed sketch available for the model at this time. However, you can take a look at the source code to understand how the network is constructed and the architectural modifications made. The pose model implementation can be found here: https://github.com/ultralytics/yolov5/tree/master/models/pose/yolo.

Please let us know if you have any other questions or concerns, we are happy to help!

@Danzip
Copy link

Danzip commented May 11, 2023

Hey man the link is dead can you please tell me on which branch? to look at? cause I cant find anything in master
image

@glenn-jocher
Copy link
Member Author

@Danzip hello! I apologize for the confusion. Thank you for bringing the dead link to our attention. Regarding the 'yolov8n' model, it is currently not available in the YOLOv8 repository. However, YOLOv8 is a part of our private research software and thus, it is not open-sourced at this time. We appreciate your interest in YOLOv8 and hope to bring more YOLOv8 models to the public in the future. If you have any further questions or concerns, please do not hesitate to reach out to us.

@r3krut
Copy link

r3krut commented May 26, 2023

@glenn-jocher Could you answer on a few questions?

  1. Where can I find information about architecture of YoloV8(detect, pose, mask) and training process?
  2. Can YoloV8 detect person, mask and keypoints simultaneously in single forward pass or not?
    Thanks!

@Drake1601
Copy link

@glenn-jocher could you answer me a question ?
91
Can yolov8 pose model predict for only single person in an image which have many people there ( that person i want to predict is the one largest in the image)
Thanks!

@glenn-jocher
Copy link
Member Author

@Drake1601 yes, the YOLOv8 pose model can predict the pose and keypoints of the person who is largest in the image, even when there are multiple people present. However, the model may also detect other individuals present in the image. The YOLOv8 pose model operates on a per-detection basis, meaning the model predicts the pose as a set of keypoints for each person object detected in the image. Therefore, the largest person detected in the image will have the highest confidence score and would be the most likely candidate to be the person of interest.

@rbychn
Copy link

rbychn commented Jun 1, 2023

is there a way to use this to detect and discard images that have hands and fingers in them? (arms till wrist is fine)

@MaxTeselkin
Copy link

Hi @glenn-jocher, thanks for YOLOv8. I am trying to get access to predicted keypoints using the following code:

results = self.model(input_image)
keypoints = results[0].keypoints

but I am unable to get access to predicted keypoints because of the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "my_folder/lib/python3.11/site-packages/ultralytics/yolo/utils/__init__.py", line 118, in __str__
    v = getattr(self, a)
        ^^^^^^^^^^^^^^^^
  File "my_folder/lib/python3.11/site-packages/ultralytics/yolo/engine/results.py", line 555, in conf
    return self.data[..., 3] if self.has_visible else None
           ~~~~~~~~~^^^^^^^^
IndexError: index 3 is out of bounds for dimension 1 with size 3

@wish44165
Copy link

Hi @glenn-jocher, thanks for YOLOv8. I am trying to get access to predicted keypoints using the following code:

results = self.model(input_image)
keypoints = results[0].keypoints

but I am unable to get access to predicted keypoints because of the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "my_folder/lib/python3.11/site-packages/ultralytics/yolo/utils/__init__.py", line 118, in __str__
    v = getattr(self, a)
        ^^^^^^^^^^^^^^^^
  File "my_folder/lib/python3.11/site-packages/ultralytics/yolo/engine/results.py", line 555, in conf
    return self.data[..., 3] if self.has_visible else None
           ~~~~~~~~~^^^^^^^^
IndexError: index 3 is out of bounds for dimension 1 with size 3

I have also encountered this issue and would like to know how to fix it. Thank you.

@Laughing-q
Copy link
Member

Laughing-q commented Jun 3, 2023

@TW-yuhsi @MaxTeselkin hi, this issue was introduced by recently updates. It should be happened only when print it though, I've fixed it in #2977. :)

@MaxTeselkin
Copy link

@Laughing-q Hi, as far as I understood this branch with bug fix has not been merged into main branch? I tried running pip install --upgrade ultralytics and ran my code again, but got the same error :(

@Laughing-q
Copy link
Member

@MaxTeselkin yeah haven't been merged yet. it could be later today or tomorrow. Before that you might need to git clone the repo and checkout branch fix_keypoint_conf and do pip install -e..

@Laughing-q
Copy link
Member

@MaxTeselkin btw only accessing keypoints.conf and print(keypoints) would encounter this error. others should be good and you can access the original tensor of keypoints by keypoints.data.

@wish44165
Copy link

@TW-yuhsi @MaxTeselkin hi, this issue was introduced by recently updates. It should be happened only when print it though, I've fixed it in #2977. :)

Got it, thanks for your prompt reply.

@wish44165
Copy link

@MaxTeselkin btw only accessing keypoints.conf and print(keypoints) would encounter this error. others should be good and you can access the original tensor of keypoints by keypoints.data.

Thanks for your help, it worked =)

@glenn-jocher
Copy link
Member Author

@TW-yuhsi you're welcome! I'm glad to hear that accessing keypoints.data worked for you. Just a heads up that this issue with accessing keypoints.conf when printing is due to a recent update to the repo, but it should be fixed soon in the main branch. Let us know if you have any further questions or concerns.

@glenn-jocher
Copy link
Member Author

@hassanpasha5630 You're welcome! I'm glad to hear that you were successful in labeling the keypoints.

For action recognition, which involves classifying sequences of human poses into actions like walking, running, sitting, etc., using keypoints, you indeed have the right idea. YOLOv8 Pose model gives you the keypoints, but it does not inherently classify actions. Therefore, you would need a separate process or model to classify actions based on those keypoints.

The usual approach is to use temporal sequences of keypoints to capture the dynamics of human movement. The common steps to follow would be:

  1. Data Preparation: For each action that you want to classify, you will need a dataset containing sequences of keypoints and their corresponding labels (e.g., walking, running).

  2. Feature Extraction: The raw coordinates of keypoints may not be directly suitable for classification due to various factors, like noise and differences in person sizes and camera perspectives. Instead, you might extract features like changes in angles between limbs, relative distances over time, or the velocity of keypoint movements.

  3. Classifier Training: With your prepared features and labels, you’ll train a classifier. This could be a traditional machine learning classifier, such as a Random Forest or SVM, or it could be a more complex temporal model like an LSTM or a GRU that can handle sequential data.

  4. Sequence Labeling: For each sequence of keypoints (extracted from frames of video), you’ll use the trained classifier to predict the action label.

Since each action consists of a series of poses over time, you would generally need a dataset where these sequences are labeled with actions. Unfortunately, as you suspected, this may involve training a new classifier since YOLOv8 itself is focused on object detection and keypoint localization, not on understanding temporal sequences necessary for action classification.

For this task, it might be helpful to look into existing works on human action recognition using pose estimation. There's a substantial amount of research on this topic that may provide you with additional insights or even pre-trained models that you can use to bootstrap your solution.

Remember that action recognition is a more complex process due to its temporal nature and may require significant computational resources and data to achieve good accuracy.

If you need more guidance, I recommend checking additional materials and tutorials focused on human action recognition. Good luck with your project, and don't hesitate to reach out if you have more questions!

@Lanson426
Copy link

How to merge the instance segmentation model and key point detection function together? According to the documentation, I see that they are now independent models yolov8-seg.pt and yolov8-pose.pt because my project needs to extract mask data as well as get If the key point data is executed sequentially, it will take too much time. Is there a way to process it in parallel? Looking forward to your reply

@glenn-jocher
Copy link
Member Author

@Lanson426 Hello,

Merging instance segmentation and keypoint detection functionalities into a single model would require significant architectural changes and is not supported out-of-the-box with the current YOLOv8 models. However, you can process instance segmentation and keypoint detection in parallel by running two separate models concurrently, each on a different thread or process.

To achieve this, you would set up a multi-threading or multi-processing system where yolov8-seg.pt runs on one thread to handle segmentation, and yolov8-pose.pt runs on another to handle keypoint detection. After processing, you can synchronize the threads to merge the results.

This approach allows you to utilize your computational resources more efficiently and reduce the overall processing time compared to a sequential execution.

For implementation details on multi-threading or multi-processing in Python, you may refer to the Python documentation or relevant Python concurrency libraries.

If you have further questions or need more assistance, please let us know. Good luck with your project!

@Igor3990
Copy link

Igor3990 commented Jan 9, 2024

Hello. I have marked 20 photos with cows for a test (for myself). there are 3 key points on each cow, but when outputting video with Yolo8n-pose model these key points are not connected by lines. what can be the problem?

@glenn-jocher
Copy link
Member Author

@Igor3990 hello,

For the keypoints to be connected by lines, the post-processing step after detection must include drawing lines between the relevant keypoints. If the lines are not appearing, it's possible that this step is missing or not correctly implemented in your current workflow.

Please ensure that your visualization code includes a function to draw lines between the keypoints based on their order or predefined skeletal structure. If you're using the YOLOv8 Pose model, you might need to add this functionality manually or adjust the existing one to match the keypoints of cows specifically.

If you continue to face issues, please refer to the Pose/Keypoint Estimation section in the documentation for more details on visualizing keypoints.

@DevPranjal
Copy link

Hello @glenn-jocher. Thanks for the amazing work. Is there a way to change the Pose models to output Oriented Bounding Boxes instead of axis-aligned BBs? I have a custom dataset having Oriented BBs, Horizontal BBs and Pose Keypoints. I am able to train the Pose model with Horizontal BBs. I am also hoping that the orientation information will increase the Pose metrics. Thanks!

@glenn-jocher
Copy link
Member Author

@DevPranjal yOLOv8 introduces pose models that are specifically trained for pose estimation tasks. These models are designed to identify keypoints on objects, which can be particularly useful in applications such as human pose estimation, where the model detects human joints and landmarks.

The YOLOv8 pose models are trained on the COCO keypoints dataset and are suitable for various pose estimation tasks. They are named with a -pose suffix, such as yolov8n-pose.pt.

The table provided lists different sizes of YOLOv8 pose models, along with their respective performance metrics, inference speeds on different hardware, and computational requirements in terms of parameters and FLOPs.

To train, validate, predict, or export a YOLOv8 pose model, you can use either the Python API or the command-line interface (CLI). The Python API involves importing the YOLO class from ultralytics and calling methods like train, val, predict, and export. The CLI provides a more straightforward way to perform these tasks using single-line commands.

For example, to train a YOLOv8n-pose model on the COCO128-pose dataset for 100 epochs at an image size of 640, you can use the following Python code:

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n-pose.yaml')  # build a new model from YAML
model = YOLO('yolov8n-pose.pt')  # load a pretrained model (recommended for training)
model = YOLO('yolov8n-pose.yaml').load('yolov8n-pose.pt')  # build from YAML and transfer weights

# Train the model
model.train(data='coco8-pose.yaml', epochs=100, imgsz=640)

Or, using the CLI:

# Build a new model from YAML and start training from scratch
yolo pose train data=coco8-pose.yaml model=yolov8n-pose.yaml epochs=100 imgsz=640

# Start training from a pretrained *.pt model
yolo pose train data=coco8-pose.yaml model=yolov8n-pose.pt epochs=100 imgsz=640

# Build a new model from YAML, transfer pretrained weights to it and start training
yolo pose train data=coco8-pose.yaml model=yolov8n-pose.yaml pretrained=yolov8n-pose.pt epochs=100 imgsz=640

The validation, prediction, and export processes are similarly straightforward, with commands provided for both Python and CLI usage.

Exporting models to different formats like ONNX, CoreML, TensorRT, etc., allows for deployment across various platforms and devices. The YOLOv8 pose models can be exported and then used directly for predictions or validations.

For more detailed information on each mode, you can refer to the provided documentation links for Predict and Export.

@AliasChenYi
Copy link

There is a problem that has been bothering me. I have repeated your yolo-pose on the coco dataset, but the result you mentioned has not been reached. May I ask why? I hope you can read this message and give me some guidance. Thank you very much!

@glenn-jocher
Copy link
Member Author

@AliasChenYi hello there! 👋

Thanks for reaching out and trying out the YOLOv8 Pose model. It's great to hear about your efforts! If you're not hitting the expected results, a few common areas to check include:

  • Dataset Integrity: Ensure the COCO dataset is correctly downloaded and accessible.
  • Training Configuration: Double-check your training configuration (yolov8n-pose.yaml) matches the one used for the reported results.
  • Environment: Make sure your training environment (software versions, hardware) is set up correctly.

Here's a quick snippet to ensure your training setup is correct:

from ultralytics import YOLO

model = YOLO('yolov8n-pose.yaml')
model.train(data='coco.yaml', epochs=100, imgsz=640)

If everything seems in order but you're still facing issues, could you share more details about your setup and the exact results you're getting? This could help in pinpointing the issue more effectively.

Thanks for your patience, and looking forward to helping you get the best out of YOLOv8 Pose! 🚀

@AliasChenYi
Copy link

嗨,你好!👋

感谢您伸出援手并试用 YOLOv8 Pose 模型。很高兴听到您的努力!如果您没有达到预期的结果,需要检查的几个常见方面包括:

  • **数据集完整性:**确保 COCO 数据集已正确下载和访问。
  • **培训配置:**仔细检查您的训练配置 () 是否与用于报告结果的配置匹配。yolov8n-pose.yaml
  • **环境:**确保您的训练环境(软件版本、硬件)设置正确。

下面是一个快速片段,以确保您的训练设置正确无误:

from ultralytics import YOLO

model = YOLO('yolov8n-pose.yaml')
model.train(data='coco.yaml', epochs=100, imgsz=640)

如果一切看起来都井井有条,但您仍然面临问题,您能否分享有关您的设置和您获得的确切结果的更多详细信息?这有助于更有效地查明问题。

感谢您的耐心等待,并期待帮助您充分利用 YOLOv8 Pose!🚀

  1. Data set: As for the data set you mentioned, the data set I used is the coco-pose.yaml file you officially gave me. In addition, relevant pictures and comments are also made according to what you said, and then it can run normally.
  2. The training configuration and environment are OK. So I am very upset at the moment, all this seems to be in good order, but it is very different from the result you give.

I'm a little suspicious of my hyper-parameter Settings, and I don't know how you set them when you run, but here's my list of parameters:

task: pose
mode: train
model: ultralytics/cfg/models/v8pose/yolov8-pose.yaml
data: ultralytics/cfg/datasets/coco-pose.yaml
epochs: 300
time: null
patience: 50
batch: 300
imgsz: 640
save: true
save_period: -1
cache: true
device: '0'
workers: 8
project: runs/train
name: yolov8s-pose
exist_ok: false
pretrained: true
optimizer: SGD
verbose: false
seed: 0
deterministic: true
single_cls: true
rect: false
cos_lr: false
close_mosaic: 10
resume: false
amp: true
fraction: 1.0
profile: false
freeze: null
multi_scale: false
overlap_mask: true
mask_ratio: 4
dropout: 0.0
val: true
split: val
save_json: false
save_hybrid: false
conf: null
iou: 0.7
max_det: 300
half: false
dnn: false
plots: true
source: null
vid_stride: 1
stream_buffer: false
visualize: false
augment: false
agnostic_nms: false
classes: null
retina_masks: false
embed: null
show: false
save_frames: false
save_txt: false
save_conf: false
save_crop: false
show_labels: true
show_conf: true
show_boxes: true
line_width: null
format: torchscript
keras: false
optimize: false
int8: false
dynamic: false
simplify: false
opset: null
workspace: 4
nms: false
lr0: 0.01
lrf: 0.01
momentum: 0.937
weight_decay: 0.0005
warmup_epochs: 3.0
warmup_momentum: 0.8
warmup_bias_lr: 0.1
box: 7.5
cls: 0.5
dfl: 1.5
pose: 12.0
kobj: 1.0
label_smoothing: 0.0
nbs: 64
hsv_h: 0.015
hsv_s: 0.7
hsv_v: 0.4
degrees: 0.0
translate: 0.1
scale: 0.5
shear: 0.0
perspective: 0.0
flipud: 0.0
fliplr: 0.5
mosaic: 1.0
mixup: 0.0
copy_paste: 0.0
auto_augment: randaugment
erasing: 0.4
crop_fraction: 1.0
cfg: null
tracker: botsort.yaml
save_dir: runs/train/yolov8s-pose

The following code is the code I trained:

import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO('ultralytics/cfg/models/v8pose/yolov8-pose.yaml')
    model.load('pre-training/yolov8s-pose.pt') 
    model.train(data=r'ultralytics/cfg/datasets/coco-pose.yaml',
                cache=True,
                imgsz=640,
                epochs=300,
                single_cls=True, 
                batch=300,
                close_mosaic=10,
                workers=8,
                device='0',
                optimizer='SGD',
                amp=True,  
                project='runs/train',
                name='yolov8s-pose',
                )

Recently, I have been looking for the reason, but no results, so I am really troubled, why there is such a big difference with the results you gave me, I hope you can give me some help, looking forward to your reply again.

This was referenced Mar 6, 2024
@glenn-jocher
Copy link
Member Author

Hello @AliasChenYi! 👋

Thank you for providing detailed information about your setup and the steps you've taken. Your configuration and training script look well-organized. If you're not achieving the expected results, here are a few suggestions that might help:

  1. Hyperparameters: Your hyperparameters seem reasonable, but small changes can sometimes lead to significant differences in performance. Comparing your settings with the defaults used in the original YOLOv8 Pose training might reveal some discrepancies. Pay special attention to lr0, lrf, and augmentation settings.

  2. Pretrained Weights: Ensure that the pretrained weights (yolov8s-pose.pt) are correctly loaded. Sometimes, issues during the loading process can affect training outcomes.

  3. Dataset: Verify the integrity of your coco-pose.yaml dataset. Ensure it's correctly pointing to your data and that the data format matches the expected structure.

  4. Version Compatibility: Check if you're using the same version of the Ultralytics YOLO library as the one used to produce the reported results. Updates or changes in the library might affect performance.

  5. Evaluation Metric: Ensure you're evaluating the model using the same metrics and dataset split as reported. Differences in evaluation methodology can lead to different results.

Since you've already checked most of these, consider experimenting with different learning rates or augmentation strategies. Sometimes, the model needs slight adjustments to hyperparameters based on the specific characteristics of the dataset.

If you continue to face challenges, sharing your findings and any specific discrepancies in results on the GitHub issues page or Ultralytics forums might provide more targeted insights from the community or the developers.

Keep experimenting, and don't hesitate to reach out for further assistance! 🚀

@AliasChenYi
Copy link

你好 !👋

感谢您提供有关您的设置和所采取的步骤的详细信息。您的配置和训练脚本看起来井井有条。如果您没有达到预期的结果,这里有一些建议可能会有所帮助:

  1. **超参数:**您的超参数看似合理,但微小的更改有时会导致性能出现显著差异。将您的设置与原始 YOLOv8 姿势训练中使用的默认值进行比较可能会发现一些差异。请特别注意 、 和 增强设置。lr0``lrf
  2. **预训练权重:**确保正确加载预训练的权重 ()。有时,加载过程中的问题可能会影响训练结果。yolov8s-pose.pt
  3. **数据:**验证数据集的完整性。确保它正确指向您的数据,并且数据格式与预期的结构匹配。coco-pose.yaml
  4. **版本兼容性:**检查您使用的 Ultralytics YOLO 库版本是否与用于生成报告结果的库版本相同。库中的更新或更改可能会影响性能。
  5. **评估指标:**确保使用与报告相同的指标和数据集拆分来评估模型。评估方法的差异可能导致不同的结果。

由于您已经检查了其中的大部分,因此请考虑尝试不同的学习率或增强策略。有时,模型需要根据数据集的特定特征对超参数进行轻微调整。

如果您继续面临挑战,在 GitHub 问题页面或 Ultralytics 论坛上分享您的发现和结果中的任何特定差异可能会提供来自社区或开发人员的更有针对性的见解。

继续尝试,不要犹豫,寻求进一步的帮助!🚀

Thank you very much for your answer and suggestions. I downloaded the data set you mentioned according to your official download method, so I think there is no problem, and the pre-training weights are also downloaded according to your official download.
I will pay special attention to the adjustment of lr0 and lrf hyperparameters you mentioned, and I hope it can be improved.
Finally, thank you for your reply. If you have any other specific suggestions, I hope you can spare no effort to give me your advice. Thank you very much.

@glenn-jocher
Copy link
Member Author

@AliasChenYi 你好!👋

很高兴看到您积极寻求解决方案,并感谢您对我们的建议保持开放态度。确实,细微调整lr0lrf可能会带来意想不到的正面影响。此外,您也可以尝试调整数据增强策略,比如修改mosaicmixup的比例,这有时也能帮助模型更好地泛化。

mosaic: 0.5
mixup: 0.2

如果调整后仍未达到预期效果,建议在训练过程中详细记录每次实验的设置和结果,以便更容易地发现哪些变化最有效。

再次感谢您的积极参与和反馈。我们社区的力量在于像您这样热心的成员。如果有任何进展或需要更多帮助,请随时更新。祝您训练顺利!🚀

@AliasChenYi
Copy link

AliasChenYi commented Mar 7, 2024 via email

@glenn-jocher
Copy link
Member Author

Hey @AliasChenYi! 👋

Glad to hear you're going to try out the suggestions! Regarding the args.yaml for YOLOv8 Pose, we typically include all necessary configurations directly within the model's YAML file or the training script. However, for a detailed record of the training arguments used, you can refer to the terminal output or logs generated during training, as these often capture the effective configuration.

For replicating our official results, ensure you're using the same dataset, model version, and training setup as mentioned in our documentation. Sometimes, even minor differences in these can lead to variations in results.

If you're looking for a specific example of how configurations might be structured, here's a quick snippet:

# Example args.yaml snippet
lr0: 0.01  # initial learning rate
lrf: 0.1  # final learning rate factor
epochs: 100  # total epochs to train
imgsz: 640  # image size
batch_size: 16  # batch size

Remember, the key to matching official results often lies in mirroring the training environment as closely as possible. If there's anything more specific you need help with, feel free to ask. Happy training! 🚀

@AliasChenYi
Copy link

AliasChenYi commented Mar 7, 2024 via email

@glenn-jocher
Copy link
Member Author

@AliasChenYi hey there! 😊

Ah, I see what you're asking for now. My apologies for the confusion earlier. Unfortunately, we don't typically share the exact args.yaml used during training as our setup and hyperparameters might be integrated into the training scripts or model configuration files directly.

However, the key hyperparameters and their values are usually reflected in the model's YAML file or mentioned in our documentation. If you're looking to replicate our results, I'd recommend sticking closely to the parameters mentioned in our official guides or the model's YAML file.

For YOLOv8 Pose, the most critical settings often include learning rate (lr0), learning rate scheduler (lrf), epochs, batch size, and image size (imgsz). Adjusting these to match the documented settings should get you pretty close to our published results.

If there's a specific aspect you're struggling with, feel free to drop more questions. Always here to help! 🚀

@Thanhthanhthanhthanh1711

Thanks, but I have a question to ask you, do you know how to get skeleton parameters and save them to a .txt file? I read yolo's docs but didn't see them mention getting this parameter.

@glenn-jocher
Copy link
Member Author

@Thanhthanhthanhthanh1711 hey there! 😊 Sure, I can help you with that. To save the skeleton keypoints detected by YOLOv8 Pose models into a .txt file, you'd typically process the results to extract the keypoints attribute, which contains the coordinates for each detected keypoint. Here's a quick way to do it in Python:

from ultralytics import YOLO

# Load your trained model
model = YOLO('yolov8n-pose.pt') 

# Run prediction on an image
results = model('path/to/image.jpg')

# Extract keypoints and save to .txt file
with open('keypoints.txt', 'w') as f:
    for kp in results.keypoints:
        f.write(f'{kp}\n')

This code snippet assumes you're working with a Pose model and will save the keypoints for each detection in 'keypoints.txt'. Each line in the file represents the keypoints for one detected object in the format provided by the results. 📝

Let me know if this helps or if you need further assistance!

@Thanhthanhthanhthanh1711

@Thanhthanhthanhthanh1711 hey there! 😊 Sure, I can help you with that. To save the skeleton keypoints detected by YOLOv8 Pose models into a .txt file, you'd typically process the results to extract the keypoints attribute, which contains the coordinates for each detected keypoint. Here's a quick way to do it in Python:

from ultralytics import YOLO

# Load your trained model
model = YOLO('yolov8n-pose.pt') 

# Run prediction on an image
results = model('path/to/image.jpg')

# Extract keypoints and save to .txt file
with open('keypoints.txt', 'w') as f:
    for kp in results.keypoints:
        f.write(f'{kp}\n')

This code snippet assumes you're working with a Pose model and will save the keypoints for each detection in 'keypoints.txt'. Each line in the file represents the keypoints for one detected object in the format provided by the results. 📝

Let me know if this helps or if you need further assistance!

i got it, thank you very much !!!!

@glenn-jocher
Copy link
Member Author

@Thanhthanhthanhthanh1711 you're welcome! I'm glad it helped. 😊 If you have any more questions while working with YOLOv8 or need further assistance, don't hesitate to reach out. Happy coding!

@Thanhthanhthanhthanh1711
Copy link

@glenn-jocher
Copy link
Member Author

Thanks for the reaction, @Thanhthanhthanhthanh1711! 🌟 If there's anything else on YOLOv8 you're curious about or need a hand with, just pop the question. Looking forward to seeing what you build next! 🚀

@JANARDHANAREDDYMS
Copy link

@xuchanghua-88 hello,

Thank you for your query. You're correct, the YOLOv8 Pose model currently supports 2D pose estimation. It identifies and estimates the position of keypoints on objects in 2D images.

If you're looking to convert these 2D keypoints into a 3D representation, you would need to involve more sophisticated techniques or models that are specifically designed for 3D pose estimation, which is currently not available in YOLOv8.

3D pose estimation typically involves monocular depth estimation or stereo images and application of principles from multi-view geometry. This level of processing is beyond the scope of the YOLOv8 Pose model.

Let me know if you have any more questions!

I think u can use automated tomography techniques mentioned "Reducing Viral Transmission through AI-based Crowd Monitoring and Social Distancing Analysis" paper by fraser Et al. to convert the 2d cordinates into 3d cordinates.

@glenn-jocher
Copy link
Member Author

Hey @JANARDHANAREDDYMS! 😊

Great suggestion! While YOLOv8 focuses on 2D pose estimation, leveraging techniques from literature like the one you mentioned could indeed be a way to experiment with extending 2D keypoints to 3D. Implementing approaches from "Reducing Viral Transmission through AI-based Crowd Monitoring and Social Distancing Analysis" by Fraser et al. requires a bit of creativity and manual effort since it would not be a direct feature of YOLOv8.

You could consider using the 2D keypoints detected by YOLOv8 Pose as input to a separate model or algorithm designed for 3D reconstruction. Here's a simplified concept:

# Assume 'keypoints_2d' is a list of 2D keypoints extracted by YOLOv8 Pose
# Apply transformation technique to convert 2D keypoints to 3D
keypoints_3d = convert_2d_to_3d(keypoints_2d)

Remember, the success of such a conversion heavily depends on the specific use case, the quality of the 2D keypoints, and the effectiveness of the 3D estimation method.

Let me know if you dive into this experiment, would love to hear about your findings!

@BDMChau
Copy link

BDMChau commented Apr 21, 2024

Hello @glenn-jocher , I'm interested in this topic because I am doing a project to detect human activity recognization using pose estimation.
After reading some articles and I found this topic, I have a few questions that make me confused a long time.
Our system has multiple cameras at many angles (almost at the upper corner) .
I intend to use yolov8 pose 2D with LSTM to classify activities. Is that OK or I need to have 3D pose for more accuracy?
Can you give me some advice?
Thanks!

@glenn-jocher
Copy link
Member Author

Hey there! 😊

Using YOLOv8 Pose for 2D keypoints combined with LSTM for activity recognition sounds like a solid approach, especially given your setup with multiple camera angles. Integrating multiple 2D perspectives can somewhat offset the lack of depth information compared to 3D pose estimation, making it suitable for various activities.

Here's a brief example of how you might structure your data for the LSTM after extracting keypoints with YOLOv8 Pose:

# Example: Assuming 'keypoints' is your extracted keypoints from YOLOv8 Pose
# 'activities' could be your labels e.g., walking, running

# Prepare your sequences
sequences = ... # Your logic to create sequences from keypoints
labels = ... # Your corresponding activity labels for each sequence

# LSTM model for activity recognition
model = LSTMModel() # Define your LSTM model

# Train your model
model.fit(sequences, labels)

While 3D pose offers depth perception that can enhance recognition in theory, starting with 2D keypoints is still very compelling due to its simplicity and lower computational requirements. Plus, your multi-camera setup should provide a good field of view! 📸

If the accuracy needs a boost or the activities are highly depth-dependent, considering a move to 3D might be beneficial. But for many cases, a well-trained 2D system does wonders. Feel free to experiment and see how it goes!

Happy coding, and best of luck with your project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests