ultralytics · glenn-jocher · May 17, 2024 · May 8, 2024 · May 8, 2024 · May 11, 2024
diff --git a/docs/en/integrations/tensorrt.md b/docs/en/integrations/tensorrt.md
@@ -145,27 +145,43 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i
 
 !!! example
 
- ```{ .py .annotate }
- from ultralytics import YOLO
+ === "Python"
 
- model = YOLO("yolov8n.pt")
- model.export(
- format="engine",
- dynamic=True, #(1)!
- batch=8, #(2)!
- workspace=4, #(3)!
- int8=True,
- data="coco.yaml", #(4)!
- )
+ ```{ .py .annotate }
+ from ultralytics import YOLO
+
+ model = YOLO("yolov8n.pt")
+ model.export(
+ format="engine",
+ dynamic=True, #(1)!
+ batch=8, #(2)!
+ workspace=4, #(3)!
+ int8=True,
+ data="coco.yaml", #(4)!
+ )
+
+ # Load the exported TensorRT INT8 model
+ model = YOLO("yolov8n.engine", task="detect")
+ # Run inference
+ result = model.predict("https://ultralytics.com/images/bus.jpg")
+ ```
+
+ 1. Exports with dynamic axes, this will be enabled by default when exporting with `int8=True` even when not explicitly set. See [export arguments](../modes/export.md#arguments) for additional information.
+ 2. Sets max batch size of 8 for exported model, which calibrates with `batch = 2 * 8` to avoid scaling errors during calibration.
+ 3. Allocates 4 GiB of memory instead of allocating the entire device for conversion process.
+ 4. Uses [COCO dataset](../datasets/detect/coco.md) for calibration, specifically the images used for [validation](../modes/val.md) (5,000 total).
 
- model = YOLO("yolov8n.engine", task="detect") # load the model
-
- ```
 
- 1. Exports with dynamic axes, this will be enabled by default when exporting with `int8=True` even when not explicitly set. See [export arguments](../modes/export.md#arguments) for additional information.
- 2. Sets max batch size of 8 for exported model, which calibrates with `batch = 2 *×* 8` to avoid scaling errors during calibration.
- 3. Allocates 4 GiB of memory instead of allocating the entire device for conversion process.
- 4. Uses [COCO dataset](../datasets/detect/coco.md) for calibration, specifically the images used for [validation](../modes/val.md) (5,000 total).
+ === "CLI"
+
+ ```bash
+ # Export a YOLOv8n PyTorch model to TensorRT format with INT8 quantization
+ yolo export model=yolov8n.pt format=engine batch=8 workspace=4 int8=True data=coco.yaml # creates 'yolov8n.engine''
+
+ # Run inference with the exported TensorRT quantized model
+ yolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'
+ ```
+
 
 ???+ warning "Calibration Cache"
 
@@ -240,12 +256,12 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i
 
  | Precision | Eval test | mean<br>(ms) | min \| max<br>(ms) | top-1 | top-5 | `batch` | size<br><sup>(pixels) |
  |-----------|------------------|--------------|--------------------|-------|-------|---------|-----------------------|
- | FP32 | Predict | 0.26 | 0.25 \| 0.28 | 0.35 | 0.61 | 8 | 640 |
- | FP32 | ImageNet<sup>val | 0.26 | |  |  | 1 | 640 |
- | FP16 | Predict | 0.18 | 0.17 \| 0.19 | 0.35 | 0.61 | 8 | 640 |
- | FP16 | ImageNet<sup>val | 0.18 | |  |  | 1 | 640 |
- | INT8 | Predict | 0.16 | 0.15 \| 0.57 | 0.32 | 0.59 | 8 | 640 |
- | INT8 | ImageNet<sup>val | 0.15 | |  |  | 1 | 640 |
+ | FP32 | Predict | 0.26 | 0.25 \| 0.28 |  |  | 8 | 640 |
+ | FP32 | ImageNet<sup>val | 0.26 | | 0.35 | 0.61 | 1 | 640 |
+ | FP16 | Predict | 0.18 | 0.17 \| 0.19 |  |  | 8 | 640 |
+ | FP16 | ImageNet<sup>val | 0.18 | | 0.35 | 0.61 | 1 | 640 |
+ | INT8 | Predict | 0.16 | 0.15 \| 0.57 |  |  | 8 | 640 |
+ | INT8 | ImageNet<sup>val | 0.15 | | 0.32 | 0.59 | 1 | 640 |
 
  === "Pose (COCO)"
 
@@ -338,19 +354,19 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i
 
  === "Jetson Orin NX 16GB"
 
- Tested with JetPack 5.1.3 (L4T 35.5.0) Ubuntu 20.04.6, `python 3.8.10`, `ultralytics==8.2.4`, `tensorrt==8.5.2.2`
+ Tested with JetPack 6.0 (L4T 36.3) Ubuntu 22.04.4 LTS, `python 3.10.12`, `ultralytics==8.2.16`, `tensorrt==10.0.1`
 
  !!! note 
  Inference times shown for `mean`, `min` (fastest), and `max` (slowest) for each test using pre-trained weights `yolov8n.engine`
 
  | Precision | Eval test | mean<br>(ms) | min \| max<br>(ms) | mAP<sup>val<br>50(B) | mAP<sup>val<br>50-95(B) | `batch` | size<br><sup>(pixels) |
  |-----------|--------------|--------------|--------------------|----------------------|-------------------------|---------|-----------------------|
- | FP32 | Predict | 6.90 | 6.89 \| 6.93 | | | 8 | 640 |
- | FP32 | COCO<sup>val | 6.97 | | 0.52 | 0.37 | 1 | 640 |
- | FP16 | Predict | 3.36 | 3.35 \| 3.39 | | | 8 | 640 |
- | FP16 | COCO<sup>val | 3.39 | | 0.52 | 0.37 | 1 | 640 |
- | INT8 | Predict | 2.32 | 2.32 \| 2.34 | | | 8 | 640 |
- | INT8 | COCO<sup>val | 2.33 | | 0.47 | 0.33 | 1 | 640 |
+ | FP32 | Predict | 6.11 | 6.10 \| 6.29 | | | 8 | 640 |
+ | FP32 | COCO<sup>val | 6.17 | | 0.52 | 0.37 | 1 | 640 |
+ | FP16 | Predict | 3.18 | 3.18 \| 3.20 | | | 8 | 640 |
+ | FP16 | COCO<sup>val | 3.19 | | 0.52 | 0.37 | 1 | 640 |
+ | INT8 | Predict | 2.30 | 2.29 \| 2.35 | | | 8 | 640 |
+ | INT8 | COCO<sup>val | 2.32 | | 0.46 | 0.32 | 1 | 640 |
 
 !!! info