New Plugin YoloNMS #3859

levipereira · 2024-05-13T15:24:52Z

YoloNMS Plugin

https://github.com/levipereira/TensorRT/tree/release/8.6/plugin/yoloNMSPlugin

Description

The YOLO model is one of the most widely used architectures for object detection, and it's continually evolving. However, there's a lack of a performant TensorRT plugin for efficient Non-Maximum Suppression (NMS). For a long time, the EfficientNMS plugin served the needs of Yolo efficiently. However, when using the Yolo model architecture for Segmentation, the EfficientNMS plugin falls short due to the absence of a detection index layer. As a result, users have resorted to various plugins/methods to address this issue.

To address this gap, I have made modifications to the EfficientNMS plugin by creating a YoloNMS plugin where add a layer that returns detection indices, which can be utilized for various purposes. Currently, this modification is available only in the release/8.6 branch. However, it's straightforward to implement it for 8.5 and later versions.

This modification not only improves performance but also enhances the plugin's versatility for multiple use cases. Moreover, this plugin can be easily implemented in any version of YOLO.

The main goal is to simplify the implementation of YOLO series models on Deepstream/TritonServer while delivering maximum performance.

All changes can be tracked since the cloning of Efficient NMS.
release/8.6...levipereira:TensorRT:release/8.6

Example of End2End Implementation:

Signed-off-by: Levi Pereira <[email protected]>

levipereira · 2024-05-15T16:46:17Z

Implementation using this plugin with DeepStream
https://github.com/levipereira/deepstream-yolov9/

johnnynunez · 2024-05-22T09:06:07Z

is it compatible with tensorrt 10? trt10 is now compatible with all plattforms like jetson agx Orin. It would be interesting to use
@levipereira

johnnynunez · 2024-05-22T10:27:02Z

@asfiyab-nvidia can you check this? Super interesting for demos on jetson agx orin!
Maybe useful to use with VILA 1.5

asfiyab-nvidia · 2024-05-22T16:31:58Z

Adding @samurdhikaru for new plugin review

levipereira · 2024-05-22T22:44:21Z

is it compatible with tensorrt 10? trt10 is now compatible with all plattforms like jetson agx Orin. It would be interesting to use @levipereira

release/10.0...levipereira:TensorRT:release/10.0

I have implemented the plugin in the release/10.0 branch, and everything is working correctly. The plugin is compatible with all versions from release/8.6 to release/10.0. I have conducted some tests on x86_64 and did not encounter any issues.
If you could test it on aarch64, I would appreciate it.

wget https://github.com/levipereira/deepstream-yolov9/releases/download/v1.0/yolov9-c-relu-qat-trt.onnx
export batch_size=1
export filepath_no_ext=yolov9-c-relu-qat-trt
./trtexec \
	--onnx=${filepath_no_ext}.onnx \
	--fp16 --int8 \
	--saveEngine=${filepath_no_ext}.engine \
	--timingCacheFile=${filepath_no_ext}.engine.timing.cache \
	--warmUp=500 \
	--duration=10  \
	--useCudaGraph \
	--useSpinWait \
	--noDataTransfers \
	--minShapes=images:1x3x640x640 \
	--optShapes=images:${batch_size}x3x640x640 \
	--maxShapes=images:${batch_size}x3x640x640
	
.
.
.
TensorRT version: 10.0.1
Loading standard plugins
[TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 19, GPU 400 (MiB)
[TRT] [MemUsageChange] Init builder kernel library: CPU +1772, GPU +314, now: CPU 1927, GPU 717 (MiB)
Start parsing network model.
[TRT] ----------------------------------------------------------------
[TRT] Input filename:   yolov9-c-converted-trt.onnx
[TRT] ONNX IR version:  0.0.7
[TRT] Opset version:    14
[TRT] Producer name:    pytorch
[TRT] Producer version: 1.14.0
[TRT] Domain:           
[TRT] Model version:    0
[TRT] Doc string:       
[TRT] ----------------------------------------------------------------
[TRT] No checker registered for op: YOLO_NMS_TRT. Attempting to check as plugin.
[TRT] No importer registered for op: YOLO_NMS_TRT. Attempting to import as plugin.
[TRT] Searching for plugin: YOLO_NMS_TRT, plugin_version: 1, plugin_namespace: 
[TRT] Successfully created plugin: YOLO_NMS_TRT
Finished parsing network model. Parse time: 0.0983361
Set shape of input tensor images for optimization profile 0 to: MIN=1x3x640x640 OPT=1x3x640x640 MAX=1x3x640x640
[TRT] Could not read timing cache from: yolov9-c-converted-trt.engine.timing.cache. A new timing cache will be generated and written.
[TRT] Global timing cache in use. Profiling results in this builder pass will be stored.

johnnynunez · 2024-05-23T11:31:00Z

amazing @levipereira !

johnnynunez · 2024-05-27T12:18:13Z

is it possible to merge asap? @samurdhikaru thank you!

samurdhikaru · 2024-05-28T08:24:26Z

@levipereira Thank you very much for the contribution.

@levipereira @johnnynunez To start off, note that IPluginV2DynamicExt is deprecated from TRT 10.0, and we no longer accept plugins which implement any of the IPluginV2-descendent interfaces. Two things need to change for the PR to be in a mergeable state:

The plugin needs to implement IPluginV3. Please see sampleNonZeroPlugin for an example, and the TRT Developer Guide's plugins chapter for more information.
The PR needs to target release/10.0 branch. Depending on timing, you may be able to target release/10.1 when it becomes available.

Beyond that, a plugin also needs to meet a general usability criterion. i.e. Its usecase should not be niche, and its functionality should not be easily achievable with existing plugins and/or TRT inbuilt ops. Note that the TRT's inbuilt INMSLayer outputs the selected indices? Does this address your usecase?

If not, can you motivate the usecase for which the boxes and the indices would be needed at the same time? Because the former is provided by the EfficientNMS plugin, and the latter is provided by INMSLayer. Thanks.

levipereira · 2024-05-28T20:42:08Z

Hi @samurdhikaru

The plugin needs to implement IPluginV3. Please see sampleNonZeroPlugin for an example, and the TRT Developer Guide's plugins chapter for more information.

I have seen the implementation of IPluginV3, and a significant amount of code needs to be adjusted and tested to upgrade from IPluginV2 to IPluginV3.
The number of examples using IPluginV3 is relatively small, which requires a significant amount of development time.

Beyond that, a plugin also needs to meet a general usability criterion. i.e. Its usecase should not be niche, and its functionality should not be easily achievable with existing plugins and/or TRT inbuilt ops. Note that the TRT's inbuilt INMSLayer outputs the selected indices? Does this address your usecase?

If not, can you motivate the usecase for which the boxes and the indices would be needed at the same time? Because the former is provided by the EfficientNMS plugin, and the latter is provided by INMSLayer. Thanks.

Regarding the use of INMSLayer instead of the plugin, it's important to note that INMSLayer returns only the indices of candidate detections and must be integrated into the model through dedicated nodes. In contrast, the plugin offers an end-to-end solution for NMS, handling standard inputs and outputs.

Implementing INMSLayer is not universally applicable because INMSLayer/NonMaxSuppression requires adding an extra layer and further processing to identify the selected data. This adds complexity and reduces its suitability for generic use cases compared to the plugin.

In fact, the YoloNMS plugin is very similar to EfficientNMS the only difference between YoloNMS and EfficientNMS is that YoloNMS returns indices. This might seem like a small change, but it is very important when integrating other plugins into the process. EfficientNMS should be used as a terminal layer, whereas YoloNMS can be used as either an intermediate or terminal layer, or both.

Thus, while INMSLayer/NonMaxSuppression provides indices of selected detections, it lacks the ease of integration and general usability offered by the plugin, which efficiently handles both boxes and indices within various models by adhering to a standardized input-output format.

INMSLayer is more flexible but not generic, whereas the plugin is more generic but less flexible.

Examples of implementation

YoloNMS Plugin Used in Conjunction with Other Plugins

This case det_indices was used by plugin ROIAling_TRT while output of (boxes,score,classes) already final output.

Same as EfficientNMS (no change)

This is a example of implementation same as EfficentNMS but using INMSLayer

levipereira · 2024-05-29T14:15:21Z

@samurdhikaru
Since the indices are already available in EfficientNMS, we could also create a version 2 of the plugin that returns these indices. This approach would also solve the problem. However, using INMSLayer instead of EfficientNMS doesn't make much sense to me because I would need to implement all the logic of EfficientNMS within my model. From what I understand, INMSLayer is merely a layer that returns the indices, whereas EfficientNMS provides the final results ready to use.

samurdhikaru · 2024-05-29T17:47:00Z

@levipereira I understand that you want the flexibility for a single plugin (layer) to return selected data and indices, which is reasonable.

In fact, the YoloNMS plugin is very similar to EfficientNMS the only difference between YoloNMS and EfficientNMS is that YoloNMS returns indices.

For this same reason, I would suggest this plugin be a new version of EfficientNMS instead of being called a different name. Note that new versions of the same plugin are free to implement newer interfaces (i.e. IPluginV3).

I am sure you can understand that adding features based off a deprecated interface is inadvisable. While I understand that there is more work required to move the plugin to IPluginV3, I believe the migration cost is not as significant as you might think; most new methods in IPluginV3 are not pure virtual (hence can be left unimplemented), and the changes to existing methods could be accommodated with simple changes to the existing code. Please see Migrating V2 Plugins to IPluginV3 from our developer guide for more information.

If you still believe the work to migrate to IPluginV3 is unmanageable, we may still consider the PR if your changes can be refactored as a new version of EfficientNMS (with indices output) with a minimal set of changes to the existing EfficientNMS plugin. But please note that this would leave the plugin open to being dropped in a future release if it does not fit into the IPluginV2->IPluginV3 migration plan. Therefore, I urge you to attempt the conversion to IPluginV3 -- you can ask for any support on this thread and I will be happy to help.

Kindly note again that TRT release/8.6 is not open for new features and hence the PR needs to target release/10.0 regardless of the above discussion.

Thanks again for your valuable contribution.

levipereira · 2024-05-29T20:31:31Z

I am sure you can understand that adding features based off a deprecated interface is inadvisable. While I understand that there is more work required to move the plugin to IPluginV3, I believe the migration cost is not as significant as you might think; most new methods in IPluginV3 are not pure virtual (hence can be left unimplemented), and the changes to existing methods could be accommodated with simple changes to the existing code.

I completely agree. I was bothered by the name YoloNMS, so I believe we should stick with EfficientNMS but modify it to return indices in a way that allows us to easily incorporate other plugins. I haven't had the time to analyze IPluginV3, but based on what you've mentioned, it makes me feel more confident since the implementation can be done with simple changes.

Implementing this in the release/8.6 is out of the question. I did this mainly as a demonstration because all the latest Nvidia SDKs (DeepStream/Triton Server) were released using 8.6, and it could, in a way, help the community.

Thank you.

levipereira · 2024-05-29T21:28:59Z

In the coming days, I will implement a new version of the EfficientNMS plugin for release/10.0, which will also return indices.

levipereira added 6 commits May 14, 2024 22:39

Clone EfficientNMSPlugin

3e4252a

Signed-off-by: Levi Pereira <[email protected]>

Added Plugin yoloNMSPlugin

f0e85ba

Signed-off-by: Levi Pereira <[email protected]>

Updated yoloNMSPlugin README.md

6b751db

Signed-off-by: Levi Pereira <[email protected]>

Update README.md

0da8234

Signed-off-by: Levi Pereira <[email protected]>

Removed YoloNMSONNX Plugin

0a76d6f

Signed-off-by: Levi Pereira <[email protected]>

Update README.md

36d6911

Signed-off-by: Levi Pereira <[email protected]>

levipereira force-pushed the release/8.6 branch from fc7ff63 to 36d6911 Compare May 15, 2024 01:41

This was referenced May 22, 2024

Support streaming video? laugh12321/TensorRT-YOLO#19

Open

TensorRT Plugin levipereira/deepstream-yolov9#1

Open

levipereira closed this May 29, 2024

levipereira mentioned this pull request Jun 1, 2024

New Plugin EfficentNMSX #3920

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Plugin YoloNMS #3859

New Plugin YoloNMS #3859

levipereira commented May 13, 2024 •

edited

levipereira commented May 15, 2024

johnnynunez commented May 22, 2024 •

edited

johnnynunez commented May 22, 2024

asfiyab-nvidia commented May 22, 2024

levipereira commented May 22, 2024

johnnynunez commented May 23, 2024

johnnynunez commented May 27, 2024

samurdhikaru commented May 28, 2024

levipereira commented May 28, 2024 •

edited

levipereira commented May 29, 2024

samurdhikaru commented May 29, 2024

levipereira commented May 29, 2024

levipereira commented May 29, 2024

New Plugin YoloNMS #3859

New Plugin YoloNMS #3859

Conversation

levipereira commented May 13, 2024 • edited

YoloNMS Plugin

Description

levipereira commented May 15, 2024

johnnynunez commented May 22, 2024 • edited

johnnynunez commented May 22, 2024

asfiyab-nvidia commented May 22, 2024

levipereira commented May 22, 2024

johnnynunez commented May 23, 2024

johnnynunez commented May 27, 2024

samurdhikaru commented May 28, 2024

levipereira commented May 28, 2024 • edited

Examples of implementation

YoloNMS Plugin Used in Conjunction with Other Plugins

Same as EfficientNMS (no change)

This is a example of implementation same as EfficentNMS but using INMSLayer

levipereira commented May 29, 2024

samurdhikaru commented May 29, 2024

levipereira commented May 29, 2024

levipereira commented May 29, 2024

levipereira commented May 13, 2024 •

edited

johnnynunez commented May 22, 2024 •

edited

levipereira commented May 28, 2024 •

edited