I don't have models/optimized/llama_v2 folder after I've run python llama_v2.py --optimize #905

KarpovVolodymyr · 2024-01-28T18:52:29Z

Describe the bug
Hello. I was following the steps from this guide https://community.amd.com/t5/ai/how-to-running-optimized-llama2-with-microsoft-directml-on-amd/ba-p/645190

at the end of step 2, when I run python llama_v2.py --optimize command, the running just stops each time if I try to re-run it again at the same place.
It generates model.onnx file in Olive\examples\directml\llama_v2\cache\models\0_OnnxConversion-14dc7b7c3125d3ad1222f0b9e2e5b807-dc5fbbbe422d406cc8fcef71d99251a4\output_model\model.onnx
and there is no models/optimized/llama_v2 folder.

As I understand there should be these files

I don't have any errors so it is difficult to understand is something wrong or not

Could you please give me some advice what I do wrong?

To Reproduce
Steps to reproduce the behavior.

Expected behavior
From the description I expect this
Once the script successfully completes, the optimized ONNX pipeline will be stored under models/optimized/llama_v2.

Olive config
Add Olive configurations here.

Olive logs
(llama2_Optimize) C:\Users\proxi\Olive\examples\directml\llama_v2>python llama_v2.py --model_type=7b-chat

Optimizing argmax_sampling
[2024-01-28 19:05:45,570] [INFO] [accelerator.py:205:create_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-01-28 19:05:45,578] [INFO] [engine.py:851:_run_pass] Running pass convert:OnnxConversion
[2024-01-28 19:05:45,590] [INFO] [footprint.py:101:create_pareto_frontier] Output all 2 models
[2024-01-28 19:05:45,590] [INFO] [footprint.py:120:_create_pareto_frontier_from_nodes] pareto frontier points: 0_OnnxConversion-14dc7b7c3125d3ad1222f0b9e2e5b807-dc5fbbbe422d406cc8fcef71d99251a4
{
"latency-avg": 0.23591
}
[2024-01-28 19:05:45,591] [INFO] [engine.py:282:run] Run history for gpu-dml:
[2024-01-28 19:05:45,595] [INFO] [engine.py:557:dump_run_history] run history:
+------------------------------------------------------------------------------------+----------------------------------+----------------+----------------+--------------------------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+====================================================================================+==================================+================+================+==========================+
| 14dc7b7c3125d3ad1222f0b9e2e5b807 | | | | |
+------------------------------------------------------------------------------------+----------------------------------+----------------+----------------+--------------------------+
| 0_OnnxConversion-14dc7b7c3125d3ad1222f0b9e2e5b807-dc5fbbbe422d406cc8fcef71d99251a4 | 14dc7b7c3125d3ad1222f0b9e2e5b807 | OnnxConversion | 0.135661 | { |
| | | | | "latency-avg": 0.23591 |
| | | | | } |
+------------------------------------------------------------------------------------+----------------------------------+----------------+----------------+--------------------------+
[2024-01-28 19:05:45,595] [INFO] [engine.py:296:run] No packaging config provided, skip packaging artifacts
Optimized Model : C:\Users\proxi\Olive\examples\directml\llama_v2\cache\models\0_OnnxConversion-14dc7b7c3125d3ad1222f0b9e2e5b807-dc5fbbbe422d406cc8fcef71d99251a4\output_model\model.onnx

Optimizing llama_v2
[2024-01-28 19:05:45,607] [INFO] [accelerator.py:205:create_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-01-28 19:05:45,635] [INFO] [engine.py:851:_run_pass] Running pass convert:OnnxConversion
C:\Users\proxi\miniconda3\envs\llama2_Optimize\lib\site-packages\torch\onnx_internal\jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
C:\Users\proxi\miniconda3\envs\llama2_Optimize\lib\site-packages\torch\onnx\utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_graph_shape_type_inference(
C:\Users\proxi\miniconda3\envs\llama2_Optimize\lib\site-packages\torch\onnx\utils.py:1209: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_graph_shape_type_inference(

Other information

OS: Windows, AMD GPU
Olive version: [e.g. 0.4.0 or main]
ONNXRuntime package and version: ONNXRuntime-directml 1.16.3

Additional context
I have an AMD videocard. So I was looking for ways how to run llama-2 with AMD GPU. I found this guide and was following the steps. https://community.amd.com/t5/ai/how-to-running-optimized-llama2-with-microsoft-directml-on-amd/ba-p/645190

The text was updated successfully, but these errors were encountered:

trajepl · 2024-01-29T03:10:19Z

Check this, seems you did not run optimization successfully.
In your case, only argmax_sampling got optimized, the llama_v2 did not, right?

From the log, I cannot tell why the optimization failed, could you help please attach the completed log?

Karjhan · 2024-03-07T14:40:59Z

I alsp have an AMD GPU and I too have a similar issue regarding the same steps: https://community.amd.com/t5/ai/how-to-running-optimized-llama2-with-microsoft-directml-on-amd/ba-p/645190
Was this issue solved?
If yes, how?

My Olive Logs:
Optimizing llama_v2
[2024-03-07 15:56:48,646] [INFO] [accelerator.py:208:create_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-03-07 15:56:48,647] [INFO] [engine.py:116:initialize] Using cache directory: cache
[2024-03-07 15:56:48,647] [INFO] [engine.py:272:run] Running Olive on accelerator: gpu-dml
[2024-03-07 15:56:48,669] [INFO] [engine.py:862:_run_pass] Running pass convert:OnnxConversion
C:\Users\anaconda3\envs\llama2_Optimize\lib\site-packages\torch\onnx_internal\jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
C:\Users\anaconda3\envs\llama2_Optimize\lib\site-packages\torch\onnx\utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_graph_shape_type_inference(
C:\Users\anaconda3\envs\llama2_Optimize\lib\site-packages\torch\onnx\utils.py:1209: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_graph_shape_type_inference(
[2024-03-07 16:02:23,506] [WARNING] [common.py:108:model_proto_to_file] Model is too large to save as a single file but 'save_as_external_data' is False. Saved tensors as external data regardless.
[2024-03-07 16:06:51,058] [WARNING] [common.py:108:model_proto_to_file] Model is too large to save as a single file but 'save_as_external_data' is False. Saved tensors as external data regardless.
[2024-03-07 16:06:51,127] [INFO] [engine.py:952:_run_pass] Pass convert:OnnxConversion finished in 602.456794 seconds
[2024-03-07 16:06:51,139] [INFO] [engine.py:862:_run_pass] Running pass optimize:OrtTransformersOptimization

jamesalster · 2024-03-07T21:27:15Z

I have the same issue, following the same instructions as Karjhan.

Windows 11, AMD CPU, AMD 6700 XT. Using olive-ai-0.50, and

Olive logs:
Optimizing llama_v2
[2024-03-07 19:56:03,993] [INFO] [accelerator.py:208:create_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-03-07 19:56:03,993] [INFO] [engine.py:116:initialize] Using cache directory: cache
[2024-03-07 19:56:03,993] [INFO] [engine.py:272:run] Running Olive on accelerator: gpu-dml
[2024-03-07 19:56:04,026] [INFO] [engine.py:862:_run_pass] Running pass convert:OnnxConversion
[2024-03-07 19:56:04,034] [INFO] [engine.py:896:_run_pass] Loaded model from cache: 1_OnnxConversion-9c3612d31e59051b1903b377f456134d-dc5fbbbe422d406cc8fcef71d99251a4 from cache\runs
[2024-03-07 19:56:04,034] [INFO] [engine.py:862:_run_pass] Running pass optimize:OrtTransformersOptimization

Sometimes, I get the following message, as well:

[2024-03-07 19:36:13,073] [WARNING] [common.py:108:model_proto_to_file] Model is too large to save as a single file but 'save_as_external_data' is False. Saved tensors as external data regardless.

trajepl · 2024-03-11T02:17:09Z

@PatriceVignola Could you help take a look?

PatriceVignola · 2024-03-11T08:41:37Z

These are classic OOM symptoms when running the script. Unfortunately the ORT optimizer that Olive uses needs way more memory than in should, which results in those OOM crashes without error messages.

Usually, I would recommend between 200gb and 300gb of RAM (which can include your pagefile). On my machine, I have 128gb of RAM and a pagefile of about 150gb and it takes around 30 minutes to go through the conversion and optimization process. It can also be done with less physical memory (and bigger pagefile size), but it might take longer.

jamesalster · 2024-03-11T10:36:34Z

That makes sense, thank you, I don't have that much memory + pagefile available. I'll keep an eye on the repository for any future updates that reduce the memory required.

andresmejia10-int · 2024-04-11T22:12:51Z

Tested on two different state of the art systems and i am trap in the error:

onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from
C:\Olive\examples\directml\llama_v2\cache\models\3_OptimumMerging-2-29928aa56000b48c6135423fa1102e45-gpu-dml\output_model\decoder_model_merged.onnx failed:Protobuf parsing failed.

i wonder if thi sis the cause ?

[2024-04-11 10:11:58,163] [INFO] [engine.py:873:_run_pass] Running pass merge:OptimumMerging
Merged ONNX model exceeds 2GB, the model will not be checked without save_path given.

LOG:

C:\Olive\examples\directml\llama_v2>python llama_v2.py

Optimizing argmax_sampling
[2024-04-11 09:06:46,186] [INFO] [run.py:246:run] Loading Olive module configuration from: C:\Olive\olive\olive_config.json
[2024-04-11 09:06:46,186] [INFO] [accelerator.py:324:create_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-04-11 09:06:46,186] [INFO] [run.py:199:run_engine] Importing pass module OnnxConversion
[2024-04-11 09:06:46,197] [INFO] [engine.py:115:initialize] Using cache directory: cache
[2024-04-11 09:06:46,197] [INFO] [engine.py:271:run] Running Olive on accelerator: gpu-dml
[2024-04-11 09:06:46,202] [INFO] [engine.py:873:_run_pass] Running pass convert:OnnxConversion
[2024-04-11 09:06:47,720] [INFO] [engine.py:960:_run_pass] Pass convert:OnnxConversion finished in 1.518627 seconds
[2024-04-11 09:06:47,727] [INFO] [engine.py:851:_run_passes] Run model evaluation for the final model...
[2024-04-11 09:06:48,244] [INFO] [engine.py:370:run_accelerator] Save footprint to footprints\argmax_sampling_gpu-dml_footprints.json.
[2024-04-11 09:06:48,244] [INFO] [engine.py:288:run] Run history for gpu-dml:
[2024-04-11 09:06:48,244] [INFO] [engine.py:576:dump_run_history] run history:
+------------------------------------------------------------------------------------+----------------------------------+------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+====================================================================================+==================================+======+
| 14dc | | | | |
+------------------------------------------------------------------------------------+----------------------------------+------+
| 0_OnnxConversion-14dc-818 | 14dc | OnnxConversion | 1.51863 | "latency-avg": 0.4114|
+------------------------------------------------------------------------------------+----------------------------------+------+
[2024-04-11 09:06:48,244] [INFO] [engine.py:303:run] No packaging config provided, skip packaging artifacts
Optimized Model : C:\Olive\examples\directml\llama_v2\cache\models\0_OnnxConversion-14dc7b7c3125d3ad1222f0b9e2e5b807-8183e3a10c90bb4a9507d579143be30e\output_model\model.onnx

Optimizing llama_v2
[2024-04-11 09:06:48,260] [INFO] [run.py:246:run] Loading Olive module configuration from: C:\Olive\olive\olive_config.json
[2024-04-11 09:06:48,260] [INFO] [accelerator.py:324:create_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-04-11 09:06:48,260] [INFO] [run.py:199:run_engine] Importing pass module OnnxConversion
[2024-04-11 09:06:48,260] [INFO] [run.py:199:run_engine] Importing pass module OrtTransformersOptimization
[2024-04-11 09:06:48,260] [INFO] [run.py:199:run_engine] Importing pass module OptimumMerging
[2024-04-11 09:06:48,260] [INFO] [engine.py:115:initialize] Using cache directory: cache
[2024-04-11 09:06:48,260] [INFO] [engine.py:271:run] Running Olive on accelerator: gpu-dml
[2024-04-11 09:06:48,292] [INFO] [engine.py:873:_run_pass] Running pass convert:OnnxConversion
C:\Users\t\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\onnx_internal\jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
C:\Users\t\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\onnx\utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_graph_shape_type_inference(
C:\Users\t\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\onnx\utils.py:1209: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_graph_shape_type_inference(
[2024-04-11 09:39:36,710] [INFO] [engine.py:960:_run_pass] Pass convert:OnnxConversion finished in 1968.418042 seconds
[2024-04-11 09:39:36,757] [INFO] [engine.py:873:_run_pass] Running pass optimize:OrtTransformersOptimization
[2024-04-11 10:11:58,085] [INFO] [engine.py:960:_run_pass] Pass optimize:OrtTransformersOptimization finished in 1941.281035 seconds

[2024-04-11 10:11:58,163] [INFO] [engine.py:873:_run_pass] Running pass merge:OptimumMerging
Merged ONNX model exceeds 2GB, the model will not be checked without save_path given.
[2024-04-11 10:17:38,660] [ERROR] [engine.py:955:_run_pass] Pass run failed.
Traceback (most recent call last):
File "C:\Olive\olive\engine\engine.py", line 943, in _run_pass
output_model_config = host.run_pass(p, input_model_config, data_root, output_model_path, pass_search_point)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Olive\olive\systems\local.py", line 31, in run_pass
output_model = the_pass.run(model, data_root, output_model_path, point)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Olive\olive\passes\olive_pass.py", line 216, in run
output_model = self._run_for_config(model, data_root, config, output_model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Olive\olive\passes\onnx\optimum_merging.py", line 85, in _run_for_config
onnxruntime.InferenceSession(output_model_path, sess_options, providers=[execution_provider])
File "C:\Users\t\AppData\Local\Programs\Python\Python311\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "C:\Users\t\AppData\Local\Programs\Python\Python311\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 472, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from
C:\Olive\examples\directml\llama_v2\cache\models\3_OptimumMerging-2-29928aa56000b48c6135423fa1102e45-gpu-dml\output_model\decoder_model_merged.onnx failed:Protobuf parsing failed.

josephyuzb · 2024-04-25T11:59:21Z

I ran :python llama_v2.py
error message is here:

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
[2024-04-25 19:58:01,158] [INFO] [engine.py:279:run] Run history for gpu-dml:
[2024-04-25 19:58:01,159] [INFO] [engine.py:567:dump_run_history] run history:
+----------------------------------+-------------------+-------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+==================================+===================+=============+================+===========+
| 9c3612d31e59051b1903b377f456134d | | | | |
+----------------------------------+-------------------+-------------+----------------+-----------+
[2024-04-25 19:58:01,159] [INFO] [engine.py:294:run] No packaging config provided, skip packaging artifacts
Traceback (most recent call last):
File "C:\Users\Administrator\olive\examples\directml\llama_v2\llama_v2.py", line 217, in
optimize(optimized_model_dir, args.model_type)
File "C:\Users\Administrator\olive\examples\directml\llama_v2\llama_v2.py", line 76, in optimize
with footprints_file_path.open("r") as footprint_file:
File "C:\ProgramData\Anaconda3\envs\llama2_Optimize\lib\pathlib.py", line 1252, in open
return io.open(self, mode, buffering, encoding, errors, newline,
File "C:\ProgramData\Anaconda3\envs\llama2_Optimize\lib\pathlib.py", line 1120, in _opener
return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Administrator\olive\examples\directml\llama_v2\footprints\llama_v2_gpu-dml_footprints.json'

(C:\ProgramData\Anaconda3\envs\llama2_Optimize) C:\Users\Administrator\olive\examples\directml\llama_v2>

KarpovVolodymyr changed the title ~~I don't have models/optimized/llama_v2 folder after I've ran python llama_v2.py --optimize~~ I don't have models/optimized/llama_v2 folder after I've run python llama_v2.py --optimize Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I don't have models/optimized/llama_v2 folder after I've run python llama_v2.py --optimize #905

I don't have models/optimized/llama_v2 folder after I've run python llama_v2.py --optimize #905

KarpovVolodymyr commented Jan 28, 2024 •

edited

trajepl commented Jan 29, 2024 •

edited

Karjhan commented Mar 7, 2024 •

edited

jamesalster commented Mar 7, 2024

trajepl commented Mar 11, 2024

PatriceVignola commented Mar 11, 2024

jamesalster commented Mar 11, 2024

andresmejia10-int commented Apr 11, 2024

josephyuzb commented Apr 25, 2024

I don't have models/optimized/llama_v2 folder after I've run python llama_v2.py --optimize #905

I don't have models/optimized/llama_v2 folder after I've run python llama_v2.py --optimize #905

Comments

KarpovVolodymyr commented Jan 28, 2024 • edited

trajepl commented Jan 29, 2024 • edited

Karjhan commented Mar 7, 2024 • edited

jamesalster commented Mar 7, 2024

trajepl commented Mar 11, 2024

PatriceVignola commented Mar 11, 2024

jamesalster commented Mar 11, 2024

andresmejia10-int commented Apr 11, 2024

josephyuzb commented Apr 25, 2024

KarpovVolodymyr commented Jan 28, 2024 •

edited

trajepl commented Jan 29, 2024 •

edited

Karjhan commented Mar 7, 2024 •

edited