1.0.0 Release Note

全场景高性能AI部署工具⚡️FastDeploy 1.0.0正式发布！🎉 支持飞桨及开源社区150+模型的多硬件高性能部署，为开发者提供简单全场景、简单易用、极致高效的全新部署体验！

多推理后端与多硬件支持

FastDeploy支持在多种硬件上以不同后端的方式进行推理部署，各后端模块可根据开发者需求灵活编译集成，自行编译参考 FastDeploy编译文档。

后端	平台	支持模型格式	支持硬件
Paddle Inference	Linux(x64)/Windows(x64)	Paddle	x86 CPU/NVIDIA GPU/Jetson/GraphCore IPU
Paddle Lite	Linux(aarch64/armhf)/Android	Paddle	Arm CPU/Kunlun R200/RV1126
Poros	Linux(x64)	TorchScript	x86 CPU/NVIDIA GPU
OpenVINO	Linux(x64)/Windows(x64)/OSX(x86)	Paddle/ONNX	x86 CPU/Intel GPU
TensorRT	Linux(x64/aarch64)/Windows(x64)	Paddle/ONNX	NVIDIA GPU/Jetson
ONNX Runtime	Linux(x64/aarch64)/Windows(x64)/OSX(x86/arm64)	Paddle/ONNX	x86 CPU/Arm CPU/NVIDIA GPU

除此之外，FastDeploy也基于Paddle.js 支持模型在网页前端及智能小程序部署工具，参阅 Web部署了解更多细节。

丰富的AI模型端到端推理

FastDeploy支持如下飞桨模型套件的端到端部署

除飞桨开发套件外，FastDeploy同时支持了开源社区内热门深度学习模型的部署，目前v1.0共完成150+模型的支持，下表为部分重点模型的支持情况，阅读部署示例了解更多详细内容。

场景	支持模型
图像分类	ResNet/MobileNet/PP-LCNet/YOLOv5-Clas等系列模型
目标检测	PP-YOLOE/PicoDet/RCNN/PP-YOLOE/YOLOv5/YOLOv6/YOLOv7/YOLOX/NanoDet等系列模型
语义分割	PP-LiteSeg/PP-HumanSeg/DeepLabv3p/UNet等系列模型
图像/视频抠图	PP-Matting/PP-Mattingv2/ModNet/RobustVideoMatting
文字识别	PP-OCRv2/PP-OCRv3
视频超分	PP-MSVSR/BasicVSR/EDVR
目标跟踪	PP-Tracking
姿态/关键点识别	PP-TinyPose/HeadPose-FSANet
人脸对齐	PFLD/FaceLandmark1000/PIPNet等系列模型
人脸检测	RetinaFace/UltraFace/YOLOv5-Face/SCRFD等系列模型
人脸识别	ArcFace/CosFace/PartialFC/VPL/AdaFace等系列模型
语音合成	PaddleSpeech 流式语音合成模型
语义表示	PaddleNLP ERNIE 3.0 Tiny系列模型
信息抽取	PaddleNLP 通用信息抽取UIE模型
文图生成	Stable Diffusion

高性能服务化部署

FastDeploy基于 Triton Inference Server 提供服务化部署能力。支持Paddle/ONNX模型在不同硬件以及不同后端上的高性能服务化部署体验。

自动化压缩与模型转换

PaddleSlim自动化压缩

FastDeploy基于 PaddleSlim 提供一键量化工具，通过如下命令快速完成模型的无损压缩加速。

fastdeploy compress --config_path=./configs/detection/yolov5s_quant.yaml \
                    --method='PTQ' --save_dir='./yolov5s_ptq_model/'

目前FastDeploy已完成量化模型与如下后端的适配测试

硬件/推理后端	ONNX Runtime	Paddle Inference	TensorRT	Paddle Inference TensorRT	Paddle Lite
CPU	支持	支持	-	-	支持
GPU	-	-	支持	支持	-
RK1126	-	-	-	-	支持

自动压缩精度与性能对比如下表所示，精度近乎无损，性能最高提升400%

一键压缩的更多细节与使用方式，参阅FastDeploy一键压缩功能。

模型转换

为了便于对多框架模型的部署支持，FastDeploy预置了 X2Paddle 转换能力，在安装FastDeploy后，通过如下命令可快速完成转换，并通过FastDeploy部署。

fastdeploy convert --framework onnx --model yolov5s.onnx --save_dir yolov5s_paddle_model

更多使用方式，参阅FastDeploy模型转换。

端到端部署性能优化

FastDeploy在各模型的部署中，重点关注端到端到的部署体验和性能。在1.0版本中，FastDeploy在端到端进行如下优化

服务端对预处理过程进行融合，降低内存创建开销和计算量
移动端集成百度视觉技术部自研高性能图像处理库 FlyCV

结合FastDeploy多后端支持的优势，相较原有部署代码，所有模型端到端性能大幅提升，下表为其中部分模型的测试数据，

1.0.0 Release Note

We are excited to announce the release of ⚡️FastDeploy 1.0.0! 🎉 FastDeploy supports high performance end-to-end deployment for over 150 AI models from PaddlePaddle and other open source community on multiple hardware.

Multiple Inference Backend and Hardware Support

FastDeploy supports inference deployment on multiple hardware with different backends, each backend module can be flexibly compiled and integrated according to the developer's needs, please refer to FastDeploy compilation documentation。

Backend	Platform	Model Format	Supported Hardware in FastDeploy
Paddle Inference	Linux(x64)/Windows(x64)	Paddle	x86 CPU/NVIDIA GPU/GraphCore IPU
Paddle Lite	Linux(aarch64/armhf)/Android	Paddle	Arm CPU/Kunlun R200/RV1126
Poros	Linux(x64)/Windows(x64)	TorchScript	x86 CPU/NVIDIA GPU
OpenVINO	Linux(x64)/Windows(x64)/OSX(x86)	Paddle/ONNX	x86 CPU/Intel GPU
TensorRT	Linux(x64/aarch64)/Windows(x64)	Paddle/ONNX	NVIDIA GPU/Jetson
ONNX Runtime	Linux(x64/aarch64)/Windows(x64)/OSX(x86/arm64)	Paddle/ONNX	x86 CPU/Arm CPU/NVIDIA GPU

In addition, FastDeploy also supports the deployment of models on the web and mini application based on Paddle.js, see Web Deployment for more details.

AI Model End-to-end Inference Support

FastDeploy supports end-to-end deployment of the following PaddlePaddle models are as follows：

In addition, FastDeploy also supports the deployment of popular deep learning models in the open source community. over 150 models are currently supported in release 1.0, the table below shows some of the key models supported, refer to deployment examples for more details.

Task	Supported Models
Classification	ResNet/MobileNet/PP-LCNet/YOLOv5-Clas and other series models
Object Detection	PP-YOLOE/PicoDet/RCNN/PP-YOLOE/YOLOv5/YOLOv6/YOLOv7/YOLOX/NanoDet and other series models
Segmentation	PP-LiteSeg/PP-HumanSeg/DeepLabv3p/UNet and other series models
Image/Video Matting	PP-Matting/PP-Mattingv2/ModNet/RobustVideoMatting
OCR	PP-OCRv2/PP-OCRv3
Video Super-Resolution	PP-MSVSR/BasicVSR/EDVR
Object Tracking	PP-Tracking
Posture/Key-point Recognition	PP-TinyPose/HeadPose-FSANet
Face Align	PFLD/FaceLandmark1000/PIPNet and other series models
Face Detection	RetinaFace/UltraFace/YOLOv5-Face/SCRFD and other series models
Face Recognition	ArcFace/CosFace/PartialFC/VPL/AdaFace and other series models
Text-to-Speech	PaddleSpeech Streaming Speech Synthesis Model
Semantic Representation	PaddleNLP ERNIE 3.0 series models
Information Extraction	PaddleNLP Universal Information Extraction UIE model
Content Generation	Stable Diffusion

High Performance Serving Deployment

⚡️FastDeploy provides high performance serving system for AI model based on Triton Inference Server . Supports the Paddle/ONNX model for a fast service-base deployment experience on different hardware and different backends.

Tool Components

PaddleSlim Auto Compression Toolkit

FastDeploy provides a one-click quantization tool based on PaddleSlim to quickly speed up the lossless compression of models with the following commands.

fastdeploy compress --config_path=./configs/detection/yolov5s_quant.yaml \
                    --method='PTQ' --save_dir='./yolov5s_ptq_model/'

FastDeploy has now completed testing the adaptation of the quantitative model on the following backend

Hardware/Deployment backend	ONNX Runtime	Paddle Inference	TensorRT	Paddle Inference TensorRT	Paddle Lite
CPU	Supported	Supported	-	-	Supported
GPU	-	-	Supported	Supported	-
RK1126	-	-	-	-	Supported

The following table compares the accuracy and performance of auto-compression, with virtually no loss of overall accuracy and improved performance 100%~400%

For more details and usage of the one-click quantization, see FastDeploy one-click quantization.

Model Conversion

To facilitate deployment support for multiple framework models, FastDeploy integrates X2Paddle conversion capabilities, which can be quickly completed and deployed via FastDeploy with the following command after installing FastDeploy.

fastdeploy convert --framework onnx --model yolov5s.onnx --save_dir yolov5s_paddle_model

For more information on how to use it, see FastDeploy Model Conversion。

End-to-end Deployment Performance Optimisation

FastDeploy focuses on the end-to-end deployment experience and performance in each model deployment. In version 1.0, FastDeploy has made the following end-to-end optimisations:

Server-side fusion of pre-processing processes to reduce memory creation overhead and computation
Mobile integration with Baidu Vision's own high-performance image processing library FlyCV

The end-to-end inference performance of all models is significantly improved compared to the original deployment code which has Combined with the advantages of FastDeploy's multi-backend support. and the following table shows the test data of some of the models

Thanks to the following developers for their contributions to FastDeploy! Contributors List
@leiqing1 @jiangjiajun @DefTruth @joey12300 @felixhjh @ziqi-jin @yunyaoXYY @wjj19950828 @heliqi @ZeyuChen @ChaoII @Zheng-Bicheng @wang-xinyu @HexToString @yeliang2258 @WinterGeng @LDOUBLEV @rainyfly @czr-gc @chenqianhe @kiddyjinjin @Zeref996 @TrellixVulnTeam @D-DanielYang @totorolin @hguandl @ChrisKong93 @Xiue233 @jm12138 @triple-Mu @yingshengBD @GodIsBoom @PatchTester @onecatcn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastDeploy 1.0.0

1.0.0 Release Note

多推理后端与多硬件支持

丰富的AI模型端到端推理

高性能服务化部署

自动化压缩与模型转换

PaddleSlim自动化压缩

模型转换

端到端部署性能优化

1.0.0 Release Note

Multiple Inference Backend and Hardware Support

AI Model End-to-end Inference Support

High Performance Serving Deployment

Tool Components

PaddleSlim Auto Compression Toolkit

Model Conversion

End-to-end Deployment Performance Optimisation

Contributors