Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT10 with JetPack 6.0 Docs update #11779

Merged
merged 8 commits into from
May 17, 2024
78 changes: 47 additions & 31 deletions docs/en/integrations/tensorrt.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,27 +145,43 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i

!!! example

```{ .py .annotate }
from ultralytics import YOLO
=== "Python"

model = YOLO("yolov8n.pt")
model.export(
format="engine",
dynamic=True, #(1)!
batch=8, #(2)!
workspace=4, #(3)!
int8=True,
data="coco.yaml", #(4)!
)
```{ .py .annotate }
from ultralytics import YOLO

model = YOLO("yolov8n.pt")
model.export(
format="engine",
dynamic=True, #(1)!
batch=8, #(2)!
workspace=4, #(3)!
int8=True,
data="coco.yaml", #(4)!
)

# Load the exported TensorRT INT8 model
model = YOLO("yolov8n.engine", task="detect")
# Run inference
result = model.predict("https://ultralytics.com/images/bus.jpg")
```

1. Exports with dynamic axes, this will be enabled by default when exporting with `int8=True` even when not explicitly set. See [export arguments](../modes/export.md#arguments) for additional information.
2. Sets max batch size of 8 for exported model, which calibrates with `batch = 2 * 8` to avoid scaling errors during calibration.
3. Allocates 4 GiB of memory instead of allocating the entire device for conversion process.
4. Uses [COCO dataset](../datasets/detect/coco.md) for calibration, specifically the images used for [validation](../modes/val.md) (5,000 total).

model = YOLO("yolov8n.engine", task="detect") # load the model

```

1. Exports with dynamic axes, this will be enabled by default when exporting with `int8=True` even when not explicitly set. See [export arguments](../modes/export.md#arguments) for additional information.
2. Sets max batch size of 8 for exported model, which calibrates with `batch = 2 *×* 8` to avoid scaling errors during calibration.
3. Allocates 4 GiB of memory instead of allocating the entire device for conversion process.
4. Uses [COCO dataset](../datasets/detect/coco.md) for calibration, specifically the images used for [validation](../modes/val.md) (5,000 total).
=== "CLI"

```bash
# Export a YOLOv8n PyTorch model to TensorRT format with INT8 quantization
yolo export model=yolov8n.pt format=engine batch=8 workspace=4 int8=True data=coco.yaml # creates 'yolov8n.engine''

# Run inference with the exported TensorRT quantized model
yolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'
```


???+ warning "Calibration Cache"

Expand Down Expand Up @@ -240,12 +256,12 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i

| Precision | Eval test | mean<br>(ms) | min \| max<br>(ms) | top-1 | top-5 | `batch` | size<br><sup>(pixels) |
|-----------|------------------|--------------|--------------------|-------|-------|---------|-----------------------|
| FP32 | Predict | 0.26 | 0.25 \| 0.28 | 0.35 | 0.61 | 8 | 640 |
| FP32 | ImageNet<sup>val | 0.26 | | | | 1 | 640 |
| FP16 | Predict | 0.18 | 0.17 \| 0.19 | 0.35 | 0.61 | 8 | 640 |
| FP16 | ImageNet<sup>val | 0.18 | | | | 1 | 640 |
| INT8 | Predict | 0.16 | 0.15 \| 0.57 | 0.32 | 0.59 | 8 | 640 |
| INT8 | ImageNet<sup>val | 0.15 | | | | 1 | 640 |
| FP32 | Predict | 0.26 | 0.25 \| 0.28 | | | 8 | 640 |
| FP32 | ImageNet<sup>val | 0.26 | | 0.35 | 0.61 | 1 | 640 |
| FP16 | Predict | 0.18 | 0.17 \| 0.19 | | | 8 | 640 |
| FP16 | ImageNet<sup>val | 0.18 | | 0.35 | 0.61 | 1 | 640 |
| INT8 | Predict | 0.16 | 0.15 \| 0.57 | | | 8 | 640 |
| INT8 | ImageNet<sup>val | 0.15 | | 0.32 | 0.59 | 1 | 640 |

=== "Pose (COCO)"

Expand Down Expand Up @@ -338,19 +354,19 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i

=== "Jetson Orin NX 16GB"

Tested with JetPack 5.1.3 (L4T 35.5.0) Ubuntu 20.04.6, `python 3.8.10`, `ultralytics==8.2.4`, `tensorrt==8.5.2.2`
Tested with JetPack 6.0 (L4T 36.3) Ubuntu 22.04.4 LTS, `python 3.10.12`, `ultralytics==8.2.16`, `tensorrt==10.0.1`

!!! note
Inference times shown for `mean`, `min` (fastest), and `max` (slowest) for each test using pre-trained weights `yolov8n.engine`

| Precision | Eval test | mean<br>(ms) | min \| max<br>(ms) | mAP<sup>val<br>50(B) | mAP<sup>val<br>50-95(B) | `batch` | size<br><sup>(pixels) |
|-----------|--------------|--------------|--------------------|----------------------|-------------------------|---------|-----------------------|
| FP32 | Predict | 6.90 | 6.89 \| 6.93 | | | 8 | 640 |
| FP32 | COCO<sup>val | 6.97 | | 0.52 | 0.37 | 1 | 640 |
| FP16 | Predict | 3.36 | 3.35 \| 3.39 | | | 8 | 640 |
| FP16 | COCO<sup>val | 3.39 | | 0.52 | 0.37 | 1 | 640 |
| INT8 | Predict | 2.32 | 2.32 \| 2.34 | | | 8 | 640 |
| INT8 | COCO<sup>val | 2.33 | | 0.47 | 0.33 | 1 | 640 |
| FP32 | Predict | 6.11 | 6.10 \| 6.29 | | | 8 | 640 |
| FP32 | COCO<sup>val | 6.17 | | 0.52 | 0.37 | 1 | 640 |
| FP16 | Predict | 3.18 | 3.18 \| 3.20 | | | 8 | 640 |
| FP16 | COCO<sup>val | 3.19 | | 0.52 | 0.37 | 1 | 640 |
| INT8 | Predict | 2.30 | 2.29 \| 2.35 | | | 8 | 640 |
| INT8 | COCO<sup>val | 2.32 | | 0.46 | 0.32 | 1 | 640 |

!!! info

Expand Down