New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I don't have models/optimized/llama_v2 folder after I've run python llama_v2.py --optimize #905
Comments
I alsp have an AMD GPU and I too have a similar issue regarding the same steps: https://community.amd.com/t5/ai/how-to-running-optimized-llama2-with-microsoft-directml-on-amd/ba-p/645190 My Olive Logs: |
I have the same issue, following the same instructions as Karjhan. Windows 11, AMD CPU, AMD 6700 XT. Using olive-ai-0.50, and Olive logs: Sometimes, I get the following message, as well: [2024-03-07 19:36:13,073] [WARNING] [common.py:108:model_proto_to_file] Model is too large to save as a single file but 'save_as_external_data' is False. Saved tensors as external data regardless. |
@PatriceVignola Could you help take a look? |
These are classic OOM symptoms when running the script. Unfortunately the ORT optimizer that Olive uses needs way more memory than in should, which results in those OOM crashes without error messages. Usually, I would recommend between 200gb and 300gb of RAM (which can include your pagefile). On my machine, I have 128gb of RAM and a pagefile of about 150gb and it takes around 30 minutes to go through the conversion and optimization process. It can also be done with less physical memory (and bigger pagefile size), but it might take longer. |
That makes sense, thank you, I don't have that much memory + pagefile available. I'll keep an eye on the repository for any future updates that reduce the memory required. |
Tested on two different state of the art systems and i am trap in the error: onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from i wonder if thi sis the cause ? [2024-04-11 10:11:58,163] [INFO] [engine.py:873:_run_pass] Running pass merge:OptimumMerging LOG: C:\Olive\examples\directml\llama_v2>python llama_v2.py Optimizing argmax_sampling Optimizing llama_v2 [2024-04-11 10:11:58,163] [INFO] [engine.py:873:_run_pass] Running pass merge:OptimumMerging |
I ran :python llama_v2.py RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory (C:\ProgramData\Anaconda3\envs\llama2_Optimize) C:\Users\Administrator\olive\examples\directml\llama_v2> |
Describe the bug
Hello. I was following the steps from this guide https://community.amd.com/t5/ai/how-to-running-optimized-llama2-with-microsoft-directml-on-amd/ba-p/645190
at the end of step 2, when I run
python llama_v2.py --optimize
command, the running just stops each time if I try to re-run it again at the same place.It generates model.onnx file in
Olive\examples\directml\llama_v2\cache\models\0_OnnxConversion-14dc7b7c3125d3ad1222f0b9e2e5b807-dc5fbbbe422d406cc8fcef71d99251a4\output_model\model.onnx
and there is no
models/optimized/llama_v2
folder.As I understand there should be these files
I don't have any errors so it is difficult to understand is something wrong or not
Could you please give me some advice what I do wrong?
To Reproduce
Steps to reproduce the behavior.
Expected behavior
From the description I expect this
Once the script successfully completes, the optimized ONNX pipeline will be stored under models/optimized/llama_v2.
Olive config
Add Olive configurations here.
Olive logs
(llama2_Optimize) C:\Users\proxi\Olive\examples\directml\llama_v2>python llama_v2.py --model_type=7b-chat
Optimizing argmax_sampling
[2024-01-28 19:05:45,570] [INFO] [accelerator.py:205:create_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-01-28 19:05:45,578] [INFO] [engine.py:851:_run_pass] Running pass convert:OnnxConversion
[2024-01-28 19:05:45,590] [INFO] [footprint.py:101:create_pareto_frontier] Output all 2 models
[2024-01-28 19:05:45,590] [INFO] [footprint.py:120:_create_pareto_frontier_from_nodes] pareto frontier points: 0_OnnxConversion-14dc7b7c3125d3ad1222f0b9e2e5b807-dc5fbbbe422d406cc8fcef71d99251a4
{
"latency-avg": 0.23591
}
[2024-01-28 19:05:45,591] [INFO] [engine.py:282:run] Run history for gpu-dml:
[2024-01-28 19:05:45,595] [INFO] [engine.py:557:dump_run_history] run history:
+------------------------------------------------------------------------------------+----------------------------------+----------------+----------------+--------------------------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+====================================================================================+==================================+================+================+==========================+
| 14dc7b7c3125d3ad1222f0b9e2e5b807 | | | | |
+------------------------------------------------------------------------------------+----------------------------------+----------------+----------------+--------------------------+
| 0_OnnxConversion-14dc7b7c3125d3ad1222f0b9e2e5b807-dc5fbbbe422d406cc8fcef71d99251a4 | 14dc7b7c3125d3ad1222f0b9e2e5b807 | OnnxConversion | 0.135661 | { |
| | | | | "latency-avg": 0.23591 |
| | | | | } |
+------------------------------------------------------------------------------------+----------------------------------+----------------+----------------+--------------------------+
[2024-01-28 19:05:45,595] [INFO] [engine.py:296:run] No packaging config provided, skip packaging artifacts
Optimized Model : C:\Users\proxi\Olive\examples\directml\llama_v2\cache\models\0_OnnxConversion-14dc7b7c3125d3ad1222f0b9e2e5b807-dc5fbbbe422d406cc8fcef71d99251a4\output_model\model.onnx
Optimizing llama_v2
[2024-01-28 19:05:45,607] [INFO] [accelerator.py:205:create_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-01-28 19:05:45,635] [INFO] [engine.py:851:_run_pass] Running pass convert:OnnxConversion
C:\Users\proxi\miniconda3\envs\llama2_Optimize\lib\site-packages\torch\onnx_internal\jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
C:\Users\proxi\miniconda3\envs\llama2_Optimize\lib\site-packages\torch\onnx\utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_graph_shape_type_inference(
C:\Users\proxi\miniconda3\envs\llama2_Optimize\lib\site-packages\torch\onnx\utils.py:1209: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
_C._jit_pass_onnx_graph_shape_type_inference(
Other information
Additional context
I have an AMD videocard. So I was looking for ways how to run llama-2 with AMD GPU. I found this guide and was following the steps. https://community.amd.com/t5/ai/how-to-running-optimized-llama2-with-microsoft-directml-on-amd/ba-p/645190
The text was updated successfully, but these errors were encountered: