ggml:add new member in GGML's internal data structure #2073
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
this PR is for multi purpose:
(1) borrow some advantages from Qualcomm's QNN(Qualcomm Neural Network, aka Qualcomm AI Engine Direct) SDK, it's a highly well-designed and concise SDK(the API in QNN SDK is really match with GGML's existing design(including existing backend design)).
this PR borrow the "rank" member from QNN SDK and re-use the existing code as much as possible and not brings side-effect or complexity to existing code
(2) borrow some advantages from PyTorch(the user could specify whether a GGML OP(such as mulmat) is accelerated by a specify backend)
this PR borrow the "use_hwaccel" member from PyTorch and re-use the existing code as much as possible and not brings side-effect or complexity to existing code
(3) cover more scenarios from upper layer code(see section "Explanation"), not including using multiple backends simultaneously because that's another topic/scenario
(4) (the main purpose is) prepare for submit Qualcomm's QNN(Qualcomm Neural Network, aka Qualcomm AI Engine Direct) backend to upstream GGML community (the first step in whisper.cpp, then in llama.cpp): PoC: Add Qualcomm mobile SoC native backend for GGML
Status
this PR has verified/validated in whisper.cpp + QNN backend on different Android phone(Qualcomm SoC based low-end phone and Qualcomm SoC based high-end phone).
Explanation
this PR is useful/helpful/meaningful for some scenarios.
in the fact, the "gpu_device" in struct whisper_context_params is similar to use_hwaccel semantically. there are 2 * n combinations here: 2(use_gpu:true/false) * n (backend_device:1-n, attention here: a DSP backend is also considered as a "gpu_device", because we should re-use the existing code as much as possible and not brings side-effect or complexity to existing code)
so a special value of "gpu_device" could be considered/assumed non hardware acceleration or fall into the original default GGML CPU backend.
accordingly, we can re-use the existing "backend" member in core data structure with a new member "use_hwaccel" in struct ggml_context. btw, I personally think we should not remove the existing "backend" from "struct ggml_tensor" although there is a plan to remove this member:
the reason for this is that there are some different scenarios(not including using multiple backends simultaneously, that's another topic/scenario) which "use_gpu&gpu_device" cannot cover:
I personally think these new members("use_hwaccel" in struct ggml_context & "rank" in struct ggml_tensor) are not redundant and these new members will NOT bring side-effect to existing codes. Of course, I understand we should not bring too much "useful codes" into existing implementation of GGML internal and we should keep GGML as compact/clean as possible, so this PR reuses existing code to the maximum extent or as much as possible:
https://github.com/zhouwg/whisper.cpp/blob/add_hwaccel_in_data_structure/ggml.c#L2995
PR approval request
@hey-shashikant, thanks for your time and approval of #2054 (which is same to this PR), could you help to take a look this PR? thanks so much.
@slaren, I'm sorry to interrupt you, could you help to take a look? thanks
@ggerganov , I'm sorry to interrupt you, could you help to take a look? thanks