Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Do follow contributes
1、添加Attention fuse pass,仅支持ORT
2、添加enable_extra_ort_opt开关,含义为是否开启额外只针对ORT的优化,默认为false
若使用,使用如下命令
相关性能测试
1、非裁剪模型
添加paddle2onnx Attention fuse pass,导出ONNX模型自带Attention node
GPU:
100次warmup,1000次预测取平均,预测时间仅包含run+数据拷贝,表示为(mean),单位为ms,w/o和w代表是否命中attention pass:
CPU:
取100次warmup以及2000次预测取平均,线程数为1,预测时间包含run+数据拷贝,表示为(mean),单位为ms,w/o和w代表是否命中attention pass:
2、裁剪模型
添加paddle2onnx Attention fuse pass,导出ONNX模型自带Attention node
GPU:
100次warmup,1000次预测取平均,预测时间仅包含run+数据拷贝,表示为(mean),单位为ms,w/o和w代表是否命中attention pass:
CPU:
取100次warmup以及2000次预测取平均,线程数为1,预测时间包含run+数据拷贝,表示为(mean),单位为ms,w/o和w代表是否命中attention pass: