Train Custom Data Tutorial 🌟 #1570

glenn-jocher · 2020-11-26T20:51:00Z

📚 This guide explains how to train your own custom dataset with YOLOv5 🚀. See YOLOv5 Docs for additional details. UPDATED 29 March 2023.

Before You Start

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.

git clone https://github.com/ultralytics/yolov3  # clone
cd yolov3
pip install -r requirements.txt  # install

Train On Custom Data

Creating a custom model to detect your objects is an iterative process of collecting and organizing images, labeling your objects of interest, training a model, deploying it into the wild to make predictions, and then using that deployed model to collect examples of edge cases to repeat and improve.

1. Create Dataset

YOLOv5 models must be trained on labelled data in order to learn classes of objects in that data. There are two options for creating your dataset before you start training:

1.1 Create dataset.yaml

COCO128 is an example small tutorial dataset composed of the first 128 images in COCO train2017. These same 128 images are used for both training and validation to verify our training pipeline is capable of overfitting. data/coco128.yaml, shown below, is the dataset config file that defines 1) the dataset root directory path and relative paths to train / val / test image directories (or *.txt files with image paths) and 2) a class names dictionary:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128  # dataset root dir
train: images/train2017  # train images (relative to 'path') 128 images
val: images/train2017  # val images (relative to 'path') 128 images
test:  # test images (optional)

# Classes (80 COCO classes)
names:
  0: person
  1: bicycle
  2: car
  ...
  77: teddy bear
  78: hair drier
  79: toothbrush

1.2 Create Labels

After using an annotation tool to label your images, export your labels to YOLO format, with one *.txt file per image (if no objects in image, no *.txt file is required). The *.txt file specifications are:

One row per object
Each row is class x_center y_center width height format.
Box coordinates must be in normalized xywh format (from 0 - 1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height.
Class numbers are zero-indexed (start from 0).

The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):

1.3 Organize Directories

Organize your train and val images and labels according to the example below. YOLOv5 assumes /coco128 is inside a /datasets directory next to the /yolov5 directory. YOLOv5 locates labels automatically for each image by replacing the last instance of /images/ in each image path with /labels/. For example:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

2. Select a Model

Select a pretrained model to start training from. Here we select YOLOv5s, the second-smallest and fastest model available. See our README table for a full comparison of all models.

3. Train

Train a YOLOv5s model on COCO128 by specifying dataset, batch-size, image size and either pretrained --weights yolov5s.pt (recommended), or randomly initialized --weights '' --cfg yolov5s.yaml (not recommended). Pretrained weights are auto-downloaded from the latest YOLOv5 release.

# Train YOLOv5s on COCO128 for 3 epochs
$ python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt

💡 ProTip: Add --cache ram or --cache disk to speed up training (requires significant RAM/disk resources).
💡 ProTip: Always train from a local dataset. Mounted or network drives like Google Drive will be very slow.

All training results are saved to runs/train/ with incrementing run directories, i.e. runs/train/exp2, runs/train/exp3 etc. For more details see the Training section of our tutorial notebook.

4. Visualize

Comet Logging and Visualization 🌟 NEW

Comet is now fully integrated with YOLOv5. Track and visualize model metrics in real time, save your hyperparameters, datasets, and model checkpoints, and visualize your model predictions with Comet Custom Panels! Comet makes sure you never lose track of your work and makes it easy to share results and collaborate across teams of all sizes!

Getting started is easy:

pip install comet_ml  # 1. install
export COMET_API_KEY=<Your API Key>  # 2. paste API key
python train.py --img 640 --epochs 3 --data coco128.yaml --weights yolov5s.pt  # 3. train

To learn more about all of the supported Comet features for this integration, check out the Comet Tutorial. If you'd like to learn more about Comet, head over to our documentation. Get started by trying out the Comet Colab Notebook:

ClearML Logging and Automation 🌟 NEW

ClearML is completely integrated into YOLOv5 to track your experimentation, manage dataset versions and even remotely execute training runs. To enable ClearML:

pip install clearml
run clearml-init to connect to a ClearML server (deploy your own open-source server here, or use our free hosted server here)

You'll get all the great expected features from an experiment manager: live updates, model upload, experiment comparison etc. but ClearML also tracks uncommitted changes and installed packages for example. Thanks to that ClearML Tasks (which is what we call experiments) are also reproducible on different machines! With only 1 extra line, we can schedule a YOLOv5 training task on a queue to be executed by any number of ClearML Agents (workers).

You can use ClearML Data to version your dataset and then pass it to YOLOv5 simply using its unique ID. This will help you keep track of your data without adding extra hassle. Explore the ClearML Tutorial for details!

Local Logging

Training results are automatically logged with Tensorboard and CSV loggers to runs/train, with a new experiment directory created for each new training as runs/train/exp2, runs/train/exp3, etc.

This directory contains train and val statistics, mosaics, labels, predictions and augmentated mosaics, as well as metrics and charts including precision-recall (PR) curves and confusion matrices.

Results file results.csv is updated after each epoch, and then plotted as results.png (below) after training completes. You can also plot any results.csv file manually:

from utils.plots import plot_results
plot_results('path/to/results.csv')  # plot 'results.csv' as 'results.png'

Next Steps

Once your model is trained you can use your best checkpoint best.pt to:

Run CLI or Python inference on new images and videos
Validate accuracy on train, val and test splits
Export to TensorFlow, Keras, ONNX, TFlite, TF.js, CoreML and TensorRT formats
Evolve hyperparameters to improve performance
Improve your model by sampling real-world images and adding them to your dataset

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

The text was updated successfully, but these errors were encountered:

github-actions · 2020-12-27T00:42:46Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

notmatthancock · 2021-03-28T16:16:49Z

There is a small typo in this tutorial:

from utils.utils import plot_results should read from utils.plots import plot_results

glenn-jocher · 2021-03-28T17:49:56Z

@notmatthancock thanks for letting us know! This should be fixed now :)

mark375chen · 2022-02-03T17:13:21Z

Can clarification be added under local logging for detecting images based on custom trained data?
From runs/train/exp3, how can I apply detections on a test image?

Also, where/how is train_batch0.jpg supposed to be generated?

glenn-jocher · 2022-02-03T17:57:09Z

@mark375chen detection can be run on any trained model, i.e. python detect.py path/to/best.pt

glenn-jocher added enhancement New feature or request tutorial Tutorial or example labels Nov 26, 2020

glenn-jocher self-assigned this Nov 26, 2020

glenn-jocher pinned this issue Nov 26, 2020

glenn-jocher changed the title ~~Train Custom Data Tutorial~~ Train Custom Data Tutorial 🌟 Nov 26, 2020

github-actions bot added the Stale label Dec 27, 2020

github-actions bot closed this as completed Jan 1, 2021

glenn-jocher reopened this Jan 7, 2021

glenn-jocher removed the Stale label Jan 7, 2021

selous123 mentioned this issue May 11, 2021

数据集划分问题 selous123/yolov3-pytorch-custom#1

Open

mark375chen mentioned this issue Feb 3, 2022

No best.pt after training #1893

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train Custom Data Tutorial 🌟 #1570

Train Custom Data Tutorial 🌟 #1570

glenn-jocher commented Nov 26, 2020 •

edited

github-actions bot commented Dec 27, 2020

notmatthancock commented Mar 28, 2021

glenn-jocher commented Mar 28, 2021

mark375chen commented Feb 3, 2022 •

edited

glenn-jocher commented Feb 3, 2022

Train Custom Data Tutorial 🌟 #1570

Train Custom Data Tutorial 🌟 #1570

Comments

glenn-jocher commented Nov 26, 2020 • edited

Before You Start

Train On Custom Data

1. Create Dataset

1.1 Create dataset.yaml

1.2 Create Labels

1.3 Organize Directories

2. Select a Model

3. Train

4. Visualize

Comet Logging and Visualization 🌟 NEW

ClearML Logging and Automation 🌟 NEW

Local Logging

Next Steps

Environments

Status

github-actions bot commented Dec 27, 2020

notmatthancock commented Mar 28, 2021

glenn-jocher commented Mar 28, 2021

mark375chen commented Feb 3, 2022 • edited

glenn-jocher commented Feb 3, 2022

glenn-jocher commented Nov 26, 2020 •

edited

mark375chen commented Feb 3, 2022 •

edited