Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neural Engine, CoreML not utilized for Apple Silicon #2124

Closed
shiyuwang-jamk opened this issue May 6, 2024 · 2 comments
Closed

Neural Engine, CoreML not utilized for Apple Silicon #2124

shiyuwang-jamk opened this issue May 6, 2024 · 2 comments

Comments

@shiyuwang-jamk
Copy link

GPU was used in spite of the -ng parameter. Zero usage in ANE.

I have followed the steps in README for CoreML, and the log looks like this:

make clean
WHISPER_COREML=1 make -j
I whisper.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DWHISPER_USE_COREML -DGGML_USE_METAL
I LDFLAGS:   -framework Accelerate -framework Foundation -framework CoreML -framework Foundation -framework Metal -framework MetalKit
I CC:       Apple clang version 15.0.0 (clang-1500.3.9.4)
I CXX:      Apple clang version 15.0.0 (clang-1500.3.9.4)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL   -c ggml.c -o ggml.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL   -c ggml-alloc.c -o ggml-alloc.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL   -c ggml-backend.c -o ggml-backend.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL   -c ggml-quants.c -o ggml-quants.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DWHISPER_USE_COREML -DGGML_USE_METAL -c whisper.cpp -o whisper.o
c++ -O3 -I . -fobjc-arc -c coreml/whisper-encoder.mm -o whisper-encoder.o
c++ -O3 -I . -fobjc-arc -c coreml/whisper-encoder-impl.m -o whisper-encoder-impl.o
cc -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL -c ggml-metal.m -o ggml-metal.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DWHISPER_USE_COREML -DGGML_USE_METAL examples/main/main.cpp examples/common.cpp examples/common-ggml.cpp examples/grammar-parser.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o whisper-encoder.o whisper-encoder-impl.o ggml-metal.o -o main  -framework Accelerate -framework Foundation -framework CoreML -framework Foundation -framework Metal -framework MetalKit
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DWHISPER_USE_COREML -DGGML_USE_METAL examples/bench/bench.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o whisper-encoder.o whisper-encoder-impl.o ggml-metal.o -o bench  -framework Accelerate -framework Foundation -framework CoreML -framework Foundation -framework Metal -framework MetalKit
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DWHISPER_USE_COREML -DGGML_USE_METAL examples/quantize/quantize.cpp examples/common.cpp examples/common-ggml.cpp examples/grammar-parser.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o whisper-encoder.o whisper-encoder-impl.o ggml-metal.o -o quantize  -framework Accelerate -framework Foundation -framework CoreML -framework Foundation -framework Metal -framework MetalKit
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DWHISPER_USE_COREML -DGGML_USE_METAL examples/server/server.cpp examples/common.cpp examples/common-ggml.cpp examples/grammar-parser.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o whisper-encoder.o whisper-encoder-impl.o ggml-metal.o -o server  -framework Accelerate -framework Foundation -framework CoreML -framework Foundation -framework Metal -framework MetalKit 
./main -h
./main -ng -l fi -otxt -ovtt -osrt -olrc -m "./models/ggml-large-v3.bin" -pp -f "../Henkilöauton ajokoe.wav" -of "whcpp-ml"
whisper_init_from_file_with_params_no_state: loading model from './models/ggml-large-v3.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:      CPU total size =  3094.36 MB
whisper_model_load: model size    = 3094.36 MB
whisper_init_state: kv self size  =  220.20 MB
whisper_init_state: kv cross size =  245.76 MB
whisper_init_state: loading Core ML model from './models/ggml-large-v3-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
whisper_init_state: compute buffer (conv)   =   10.92 MB
whisper_init_state: compute buffer (cross)  =    9.38 MB
whisper_init_state: compute buffer (decode) =  209.26 MB

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 1 | OPENVINO = 0

main: processing '../Henkilöauton ajokoe.wav' (20244723 samples, 1265.3 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = fi, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:02.000]   Terve. Moikka.
[00:00:02.000 --> 00:00:04.000]   Mä voin mennä sajelemaan. Joo.
[00:00:04.000 --> 00:00:06.000]   Joo, Mika. Petra, moi.
[00:00:06.000 --> 00:00:08.000]   Okei, mä voin ottaa sulta ne paperit sieltä.
[00:00:08.000 --> 00:00:10.000]   Siitä se ja se.
[00:00:10.000 --> 00:00:12.000]   All right, mennään tonne autoon. Joo.
[00:00:12.000 --> 00:00:14.000]   Katsotaan siellä loppuun.
[00:00:14.000 --> 00:00:23.000]   All right, sit mä tarkastan, et mul on oikee puski mukana.
@ggerganov
Copy link
Owner

Don't add -ng and adjust the following parameter in the code to enable ANE and CPU-only:

// select which device to run the Core ML model on
MLModelConfiguration *config = [[MLModelConfiguration alloc] init];
// config.computeUnits = MLComputeUnitsCPUAndGPU;
//config.computeUnits = MLComputeUnitsCPUAndNeuralEngine;
config.computeUnits = MLComputeUnitsAll;

@shiyuwang-jamk
Copy link
Author

Thanks for the compiling tip. It does not seem I can monitor neural engine usage with powermetrics as it does for CPUs and GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants