Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper does not converted using onnxruntime-directml #813

Open
DimQ1 opened this issue Dec 13, 2023 · 2 comments
Open

Whisper does not converted using onnxruntime-directml #813

DimQ1 opened this issue Dec 13, 2023 · 2 comments

Comments

@DimQ1
Copy link

DimQ1 commented Dec 13, 2023

I prepared a configuration file for converting whisper using directml but the process fails with an error.

To Reproduce

Expected behavior
It would be grate to use whisper with directml

Olive config
Use following configuration file to convert model:
whisper_gpu_int8_dml.json

Olive logs
[2023-12-13 09:32:13,697] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\0_OnnxConversion-386174a033bcd76f8941e56a22420503-0f2f01796d1fdfcd7c7058df3febec4e\output_model\decoder\model.onnx is inferred to be of type file.
[2023-12-13 09:32:13,816] [INFO] [quantization.py:354:_run_for_config] Preprocessing model for quantization
[2023-12-13 09:32:58,789] [INFO] [quantization.py:354:_run_for_config] Preprocessing model for quantization
[2023-12-13 09:33:24,424] [INFO] [engine.py:931:_run_pass] Running pass insert_beam_search:InsertBeamSearch
[2023-12-13 09:33:24,426] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\1_OnnxDynamicQuantization-0-81443df774677d62399dbb62abc7a493\output_model\encoder_decoder_init\model.onnx is inferred to be of type file.
[2023-12-13 09:33:24,428] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\1_OnnxDynamicQuantization-0-81443df774677d62399dbb62abc7a493\output_model\decoder\model.onnx is inferred to be of type file.
[2023-12-13 09:33:25,604] [WARNING] [insert_beam_search.py:171:chain_model] DecoderMaskedMultiHeadAttention could not be applied to whisper decoder subgraph
Removed 203 initializers with duplicated value
Removed 101 initializers with duplicated value
[2023-12-13 09:33:29,278] [DEBUG] [insert_beam_search.py:192:chain_model] Using IR version 8 for chained model
[2023-12-13 09:33:33,548] [INFO] [engine.py:931:_run_pass] Running pass prepost:AppendPrePostProcessingOps
[2023-12-13 09:33:33,550] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\2_InsertBeamSearch-1-51b19e895c1591ef53a44fb74c8eac16\output_model\model_with_beam_search.onnx is inferred to be of type file.
[2023-12-13 09:33:33,551] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\2_InsertBeamSearch-1-51b19e895c1591ef53a44fb74c8eac16\output_model\model_with_beam_search.onnx is inferred to be of type file.
[W shape_type_inference.cpp:1978] Warning: The shape inference of ai.onnx.contrib::StftNorm type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[2023-12-13 09:33:37,374] [DEBUG] [engine.py:1071:_evaluate_model] Evaluating model ...
[2023-12-13 09:33:37,374] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\3_AppendPrePostProcessingOps-2-4d9a9990e2391432dff23d272724f7c8\output_model\model_with_beam_search.onnx is inferred to be of type file.
[2023-12-13 09:33:37,376] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\3_AppendPrePostProcessingOps-2-4d9a9990e2391432dff23d272724f7c8\output_model\model_with_beam_search.onnx is inferred to be of type file.
[2023-12-13 09:33:37,976] [DEBUG] [olive_evaluator.py:244:generate_metric_user_config_with_model_io] Model input shapes are not static. Cannot use inferred input shapes for creating dummy data. This will cause an error when creating dummy data for tuning.
[2023-12-13 09:33:37,979] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\data is inferred to be of type folder.
2023-12-13 09:33:50.4631945 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2538)\onnxruntime_pybind11_state.pyd!00007FFB2E960DA9: (caller: 00007FFB2F07EDDF) Exception(3) tid(3580) 80070057 The parameter is incorrect.

2023-12-13 09:33:50.4741269 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running BeamSearch node. Name:'BeamSearch_node' Status Message: Non-zero status code returned while running Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2538)\onnxruntime_pybind11_state.pyd!00007FFB2E960DA9: (caller: 00007FFB2F07EDDF) Exception(3) tid(3580) 80070057 The parameter is incorrect.

[2023-12-13 09:33:50,485] [WARNING] [engine.py:438:run_accelerator] Failed to run Olive on gpu-dml: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BeamSearch node. Name:'BeamSearch_node' Status Message: Non-zero status code returned while running Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2538)\onnxruntime_pybind11_state.pyd!00007FFB2E960DA9: (caller: 00007FFB2F07EDDF) Exception(3) tid(3580) 80070057 The parameter is incorrect.
Traceback (most recent call last):
File "C:\Program Files\Python311\Lib\site-packages\olive\engine\engine.py", line 418, in run_accelerator
return self.run_no_search(
^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\olive\engine\engine.py", line 489, in run_no_search
should_prune, signal, model_ids = self._run_passes(
^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\olive\engine\engine.py", line 910, in _run_passes
signal = self._evaluate_model(model_config, model_id, data_root, evaluator_config, accelerator_spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\olive\engine\engine.py", line 1097, in _evaluate_model
signal = self.target.evaluate_model(model_config, data_root, metrics, accelerator_spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\olive\systems\local.py", line 49, in evaluate_model
return evaluator.evaluate(model, data_root, metrics, device=device, execution_providers=execution_providers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 215, in evaluate
metrics_res[metric.name] = self._evaluate_latency(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 132, in _evaluate_latency latencies = self._evaluate_raw_latency(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 784, in _evaluate_raw_latency
return self._evaluate_onnx_latency(model, metric, dataloader, post_func, device, execution_providers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 559, in _evaluate_onnx_latency
session.run(input_feed=input_dict, output_names=None)
File "C:\Program Files\Python311\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BeamSearch node. Name:'BeamSearch_node' Status Message: Non-zero status code returned while running Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2538)\onnxruntime_pybind11_state.pyd!00007FFB2E960DA9: (caller: 00007FFB2F07EDDF) Exception(3) tid(3580) 80070057 The parameter is incorrect.

[2023-12-13 09:33:50,810] [INFO] [engine.py:359:run] Run history for gpu-dml:
[2023-12-13 09:33:50,823] [INFO] [engine.py:636:dump_run_history] run history:
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+
| model_id | parent_model_id
| from_pass | duration_sec | metrics |
+====================================================================================+====================================================================================+============================+================+===========+
| 386174a033bcd76f8941e56a22420503 |
| | | |
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+
| 0_OnnxConversion-386174a033bcd76f8941e56a22420503-0f2f01796d1fdfcd7c7058df3febec4e | 386174a033bcd76f8941e56a22420503 | OnnxConversion | 55.8453 | |
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+
| 1_OnnxDynamicQuantization-0-81443df774677d62399dbb62abc7a493 | 0_OnnxConversion-386174a033bcd76f8941e56a22420503-0f2f01796d1fdfcd7c7058df3febec4e | OnnxDynamicQuantization | 70.7202 | |
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+
| 2_InsertBeamSearch-1-51b19e895c1591ef53a44fb74c8eac16 | 1_OnnxDynamicQuantization-0-81443df774677d62399dbb62abc7a493 | InsertBeamSearch | 9.11838 | |
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+
| 3_AppendPrePostProcessingOps-2-4d9a9990e2391432dff23d272724f7c8 | 2_InsertBeamSearch-1-51b19e895c1591ef53a44fb74c8eac16 | AppendPrePostProcessingOps | 3.81862 | |
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+
[2023-12-13 09:33:50,826] [INFO] [engine.py:374:run] No packaging config provided, skip packaging artifacts

Other information

  • OS: Windows
  • Olive version: latest from git source 0.5.0
  • ONNXRuntime version: onnxruntime-directml 1.16.3
@DimQ1 DimQ1 changed the title Whispe does not converted using onnxruntime-directml Whisper does not converted using onnxruntime-directml Dec 14, 2023
@trajepl
Copy link
Contributor

trajepl commented Dec 15, 2023

microsoft/onnxruntime#18805
Seems beam search node for whisper is not available in Dml EP.

@guotuofeng
Copy link
Collaborator

@PatriceVignola, is there any plan we add the beam search op support for directml?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants