Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how do I build edgen locally in Mac #110

Open
prabirshrestha opened this issue Mar 6, 2024 · 18 comments
Open

how do I build edgen locally in Mac #110

prabirshrestha opened this issue Mar 6, 2024 · 18 comments

Comments

@prabirshrestha
Copy link

What is a correct way to build edgen locally in Mac with metal?

git clone https://github.com/edgenai/edgen.git
cd edgen/edgen
npm run tauri build

This seems to always crash with segfault with or without llama_meta feature. It used to work before but has been failing recently.

cargo run --release --features llama_metal -- serve
   Compiling edgen v0.1.3 (/Users/username/code/tmp/edgen/edgen/src-tauri)
    Finished release [optimized] target(s) in 3.10s
     Running `/Users/username/code/tmp/edgen/target/release/edgen serve`
Segmentation fault: 11
curl http://localhost:33322/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer no-key-required" -d '{
  "model": "default",
  "messages": [
    {
      "role": "system",
      "content": "You are EdgenChat, a helpful AI assistant."
    },
    {
      "role": "user",
      "content": "Hello!"
    }
  ]
}'

I'm using default config and have reset it too.

@opeolluwa
Copy link
Contributor

@prabirshrestha when the build fails, does it out a specific error message

@prabirshrestha
Copy link
Author

Build works. But running the server fails with the error I mentioned. That is the only error I see even with RUST_BACKTRACE=1

@opeolluwa
Copy link
Contributor

@prabirshrestha this "Segmentation fault: 11" ?

@prabirshrestha
Copy link
Author

yes. Seems like using the official release version in Mac also seems to fail now. Probably some changes in master that is causing the issue.

@prabirshrestha
Copy link
Author

Now I'm getting this error.

    Finished dev [unoptimized + debuginfo] target(s) in 8.58s
     Running `target/debug/edgen serve`
Assertion failed: (ne % ggml_blck_size(type) == 0), function ggml_row_size, file ggml.c, line 2126.
Abort trap: 6

@opeolluwa
Copy link
Contributor

yes. Seems like using the official release version in Mac also seems to fail now. Probably some changes in master that is causing the issue.

Most likely, I'll inspect the CI build, might be a system deps or something

@prabirshrestha
Copy link
Author

Here are the new logs 45f2a7d
/Users/prabirshrestha/code/tmp/edgen$ cargo run --release
   Compiling edgen v0.1.5 (/Users/prabirshrestha/code/tmp/edgen/edgen/src-tauri)
    Finished release [optimized] target(s) in 2.99s
     Running `target/release/edgen`
2024-03-27T02:34:21.218710Z  INFO edgen_core::settings: Loading existing settings file: /Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/edgen.conf.yaml
2024-03-27T02:34:21.221257Z  INFO edgen_server: Using default URI
2024-03-27T02:34:21.221333Z  INFO edgen_server: Listening in on: http://127.0.0.1:33322                                                                    2024-03-27T02:34:33.235666Z  INFO edgen_server::model: Loading existing model patterns file
2024-03-27T02:34:33.235867Z  INFO hf_hub: Token file not found "/Users/prabirshrestha/.cache/huggingface/token"
2024-03-27T02:34:33.236960Z  INFO edgen_server::status: progress observer: no download necessary, file is already there
2024-03-27T02:34:33.237134Z  INFO edgen_core::perishable: (Re)Creating a new llama_cpp::model::LlamaModel
2024-03-27T02:34:33.237180Z  INFO edgen_rt_llama_cpp: Loading /Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/models/chat/completions/models--TheBloke--neural-chat-7B-v3-3-GGUF/snapshots/5a354dacb2b2e2014cd239755920b2362be64d13/neural-chat-7b-v3-3.Q4_K_M.gguf into memory
2024-03-27T02:34:33.238119Z  INFO llama_cpp::model: Loading model "/Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/models/chat/completions/models--TheBloke--neural-chat-7B-v3-3-GGUF/snapshots/5a354dacb2b2e2014cd239755920b2362be64d13/neural-chat-7b-v3-3.Q4_K_M.gguf"
2024-03-27T02:34:33.242906Z  INFO llama.cpp: llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/models/chat/completions/models--TheBloke--neural-chat-7B-v3-3-GGUF/snapshots/5a354dacb2b2e2014cd239755920b2362be64d13/neural-chat-7b-v3-3.Q4_K_M.gguf (version GGUF V3 (latest))
2024-03-27T02:34:33.242920Z  INFO llama.cpp: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
2024-03-27T02:34:33.242926Z  INFO llama.cpp: llama_model_loader: - kv   0:                       general.architecture str              = llama
2024-03-27T02:34:33.242929Z  INFO llama.cpp: llama_model_loader: - kv   1:                               general.name str              = intel_neural-chat-7b-v3-3
2024-03-27T02:34:33.242932Z  INFO llama.cpp: llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
2024-03-27T02:34:33.242934Z  INFO llama.cpp: llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
2024-03-27T02:34:33.242936Z  INFO llama.cpp: llama_model_loader: - kv   4:                          llama.block_count u32              = 32
2024-03-27T02:34:33.242939Z  INFO llama.cpp: llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
2024-03-27T02:34:33.242941Z  INFO llama.cpp: llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
2024-03-27T02:34:33.242943Z  INFO llama.cpp: llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
2024-03-27T02:34:33.242946Z  INFO llama.cpp: llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
2024-03-27T02:34:33.242950Z  INFO llama.cpp: llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
2024-03-27T02:34:33.242954Z  INFO llama.cpp: llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
2024-03-27T02:34:33.242956Z  INFO llama.cpp: llama_model_loader: - kv  11:                          general.file_type u32              = 15
2024-03-27T02:34:33.242958Z  INFO llama.cpp: llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
2024-03-27T02:34:33.247335Z  INFO llama.cpp: llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
2024-03-27T02:34:33.255357Z  INFO llama.cpp: llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
2024-03-27T02:34:33.256454Z  INFO llama.cpp: llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
2024-03-27T02:34:33.256457Z  INFO llama.cpp: llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
2024-03-27T02:34:33.256459Z  INFO llama.cpp: llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
2024-03-27T02:34:33.256461Z  INFO llama.cpp: llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
2024-03-27T02:34:33.256462Z  INFO llama.cpp: llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 0
2024-03-27T02:34:33.256464Z  INFO llama.cpp: llama_model_loader: - kv  20:               general.quantization_version u32              = 2
2024-03-27T02:34:33.256466Z  INFO llama.cpp: llama_model_loader: - type  f32:   65 tensors
2024-03-27T02:34:33.256468Z  INFO llama.cpp: llama_model_loader: - type q4_K:  193 tensors
2024-03-27T02:34:33.256470Z  INFO llama.cpp: llama_model_loader: - type q6_K:   33 tensors
2024-03-27T02:34:33.266441Z  INFO llama.cpp: llm_load_vocab: special tokens definition check successful ( 259/32000 ).
2024-03-27T02:34:33.266445Z  INFO llama.cpp: llm_load_print_meta: format           = GGUF V3 (latest)
2024-03-27T02:34:33.266447Z  INFO llama.cpp: llm_load_print_meta: arch             = llama
2024-03-27T02:34:33.266448Z  INFO llama.cpp: llm_load_print_meta: vocab type       = SPM
2024-03-27T02:34:33.266450Z  INFO llama.cpp: llm_load_print_meta: n_vocab          = 32000
2024-03-27T02:34:33.266451Z  INFO llama.cpp: llm_load_print_meta: n_merges         = 0
2024-03-27T02:34:33.266453Z  INFO llama.cpp: llm_load_print_meta: n_ctx_train      = 32768
2024-03-27T02:34:33.266454Z  INFO llama.cpp: llm_load_print_meta: n_embd           = 4096
2024-03-27T02:34:33.266456Z  INFO llama.cpp: llm_load_print_meta: n_head           = 32
2024-03-27T02:34:33.266458Z  INFO llama.cpp: llm_load_print_meta: n_head_kv        = 8
2024-03-27T02:34:33.266459Z  INFO llama.cpp: llm_load_print_meta: n_layer          = 32
2024-03-27T02:34:33.266460Z  INFO llama.cpp: llm_load_print_meta: n_rot            = 128
2024-03-27T02:34:33.266462Z  INFO llama.cpp: llm_load_print_meta: n_embd_head_k    = 128
2024-03-27T02:34:33.266463Z  INFO llama.cpp: llm_load_print_meta: n_embd_head_v    = 128
2024-03-27T02:34:33.266465Z  INFO llama.cpp: llm_load_print_meta: n_gqa            = 4
2024-03-27T02:34:33.266466Z  INFO llama.cpp: llm_load_print_meta: n_embd_k_gqa     = 1024
2024-03-27T02:34:33.266468Z  INFO llama.cpp: llm_load_print_meta: n_embd_v_gqa     = 1024
2024-03-27T02:34:33.266469Z  INFO llama.cpp: llm_load_print_meta: f_norm_eps       = 0.0e+00
2024-03-27T02:34:33.266471Z  INFO llama.cpp: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
2024-03-27T02:34:33.266473Z  INFO llama.cpp: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
2024-03-27T02:34:33.266474Z  INFO llama.cpp: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
2024-03-27T02:34:33.266475Z  INFO llama.cpp: llm_load_print_meta: f_logit_scale    = 0.0e+00
2024-03-27T02:34:33.266477Z  INFO llama.cpp: llm_load_print_meta: n_ff             = 14336
2024-03-27T02:34:33.266478Z  INFO llama.cpp: llm_load_print_meta: n_expert         = 0
2024-03-27T02:34:33.266480Z  INFO llama.cpp: llm_load_print_meta: n_expert_used    = 0
2024-03-27T02:34:33.266481Z  INFO llama.cpp: llm_load_print_meta: causal attn      = 1
2024-03-27T02:34:33.266483Z  INFO llama.cpp: llm_load_print_meta: pooling type     = 0
2024-03-27T02:34:33.266484Z  INFO llama.cpp: llm_load_print_meta: rope type        = 0
2024-03-27T02:34:33.266485Z  INFO llama.cpp: llm_load_print_meta: rope scaling     = linear
2024-03-27T02:34:33.266487Z  INFO llama.cpp: llm_load_print_meta: freq_base_train  = 10000.0
2024-03-27T02:34:33.266489Z  INFO llama.cpp: llm_load_print_meta: freq_scale_train = 1
2024-03-27T02:34:33.266490Z  INFO llama.cpp: llm_load_print_meta: n_yarn_orig_ctx  = 32768
2024-03-27T02:34:33.266492Z  INFO llama.cpp: llm_load_print_meta: rope_finetuned   = unknown
2024-03-27T02:34:33.266493Z  INFO llama.cpp: llm_load_print_meta: ssm_d_conv       = 0
2024-03-27T02:34:33.266495Z  INFO llama.cpp: llm_load_print_meta: ssm_d_inner      = 0
2024-03-27T02:34:33.266496Z  INFO llama.cpp: llm_load_print_meta: ssm_d_state      = 0
2024-03-27T02:34:33.266497Z  INFO llama.cpp: llm_load_print_meta: ssm_dt_rank      = 0
2024-03-27T02:34:33.266499Z  INFO llama.cpp: llm_load_print_meta: model type       = 7B
2024-03-27T02:34:33.266521Z  INFO llama.cpp: llm_load_print_meta: model ftype      = Q4_K - Medium
2024-03-27T02:34:33.266523Z  INFO llama.cpp: llm_load_print_meta: model params     = 7.24 B
2024-03-27T02:34:33.266525Z  INFO llama.cpp: llm_load_print_meta: model size       = 4.07 GiB (4.83 BPW)
2024-03-27T02:34:33.266526Z  INFO llama.cpp: llm_load_print_meta: general.name     = intel_neural-chat-7b-v3-3
2024-03-27T02:34:33.266528Z  INFO llama.cpp: llm_load_print_meta: BOS token        = 1 '<s>'
2024-03-27T02:34:33.266529Z  INFO llama.cpp: llm_load_print_meta: EOS token        = 2 '</s>'
2024-03-27T02:34:33.266531Z  INFO llama.cpp: llm_load_print_meta: UNK token        = 0 '<unk>'
2024-03-27T02:34:33.266533Z  INFO llama.cpp: llm_load_print_meta: PAD token        = 0 '<unk>'
2024-03-27T02:34:33.266534Z  INFO llama.cpp: llm_load_print_meta: LF token         = 13 '<0x0A>'
2024-03-27T02:34:33.266550Z  INFO llama.cpp: llm_load_tensors: ggml ctx size =    0.11 MiB
2024-03-27T02:34:33.267130Z  INFO llama.cpp: llm_load_tensors:        CPU buffer size =  4165.37 MiB
2024-03-27T02:34:33.267526Z  WARN llama_cpp::model: Could not find metadata key="%s.attention.key_length"
2024-03-27T02:34:33.267530Z  WARN llama_cpp::model: Could not find metadata key="%s.attention.value_length"
2024-03-27T02:34:33.267533Z  WARN llama_cpp::model: Could not find metadata key="%s.ssm.conv_kernel"
2024-03-27T02:34:33.267535Z  WARN llama_cpp::model: Could not find metadata key="%s.ssm.inner_size"
2024-03-27T02:34:33.267536Z  WARN llama_cpp::model: Could not find metadata key="%s.ssm.state_size"
2024-03-27T02:34:33.267556Z  INFO edgen_rt_llama_cpp: No matching session found, creating new one
2024-03-27T02:34:33.267567Z  INFO edgen_core::perishable: (Re)Creating a new llama_cpp::session::LlamaSession
2024-03-27T02:34:33.267569Z  INFO edgen_rt_llama_cpp: Allocating new LLM session
2024-03-27T02:34:33.267581Z  INFO llama.cpp: llama_new_context_with_model: n_ctx      = 4096
2024-03-27T02:34:33.267584Z  INFO llama.cpp: llama_new_context_with_model: n_batch    = 2048
2024-03-27T02:34:33.267585Z  INFO llama.cpp: llama_new_context_with_model: n_ubatch   = 512
2024-03-27T02:34:33.267587Z  INFO llama.cpp: llama_new_context_with_model: freq_base  = 10000.0
2024-03-27T02:34:33.267589Z  INFO llama.cpp: llama_new_context_with_model: freq_scale = 1
2024-03-27T02:34:33.304810Z  INFO llama.cpp: llama_kv_cache_init:        CPU KV buffer size =   512.00 MiB
2024-03-27T02:34:33.304822Z  INFO llama.cpp: llama_new_context_with_model: KV self size  =  512.00 MiB, K (f16):  256.00 MiB, V (f16):  256.00 MiB
2024-03-27T02:34:33.321556Z  INFO llama.cpp: llama_new_context_with_model:        CPU  output buffer size =   250.00 MiB
GGML_ASSERT: /Users/prabirshrestha/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/ggml.c:4906: b->type == GGML_TYPE_I32
Abort trap: 6

@opeolluwa
Copy link
Contributor

Would you like to share your environment, OS version, Rust and Nodejs tool chain version and all

@opeolluwa
Copy link
Contributor

I'll build this on my Mac and see where we stand

@opeolluwa
Copy link
Contributor

@francis2tm see this

@opeolluwa
Copy link
Contributor

opeolluwa commented Mar 27, 2024

@prabirshrestha I tried building it on a Mac, I think there might be some missing system deps:

I make a fork and added some README. https://github.com/opeolluwa/edgen/tree/main/edgen Follow the instructions, let's see where we go from there

Looking for "nm" or an equivalent tool
  NM_PATH not set, looking for ["nm", "llvm-nm"] in PATH
  Valid tool found:
  llvm-nm, compatible with GNU nm
  Apple LLVM version 14.0.3 (clang-1403.0.22.14.1)
    Optimized build.
    Default target: arm64-apple-darwin22.6.0
    Host CPU: apple-m1

  cargo:rerun-if-env-changed=OBJCOPY_PATH
  Looking for "objcopy" or an equivalent tool..
  OBJCOPY_PATH not set, looking for ["llvm-objcopy"] in PATH

  --- stderr
  CMake Warning:
    Manually-specified variables were not used by the project:

      CMAKE_ASM_COMPILER
      CMAKE_ASM_FLAGS


  make: warning: jobserver unavailable: using -j1.  Add `+' to parent make rule.
  /Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/whisper.cpp:1026:75: warning: unused parameter 'params' [-Wunused-parameter]
  static ggml_backend_t whisper_backend_init(const whisper_context_params & params) {
                                                                            ^
  /Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/whisper.cpp:1620:27: warning: unused parameter 'mel_offset' [-Wunused-parameter]
                const int   mel_offset) {
                            ^
  /Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/whisper.cpp:202:29: warning: unused function 'ggml_mul_mat_pad' [-Wunused-function]
  static struct ggml_tensor * ggml_mul_mat_pad(struct ggml_context * ctx, struct ggml_tensor * x, struct ggml_tensor * y, int pad = 32) {
                              ^
  3 warnings generated.
  thread 'main' panicked at /Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/build.rs:295:9:
  No suitable tool equivalent to "objcopy" has been found in PATH, if one is already installed, either add its directory to PATH or set OBJCOPY_PATH to its full path. For your Operating System we recommend:
  "llvm-objcopy" from LLVM 17
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
       Error failed to build app: failed to build app

@opeolluwa
Copy link
Contributor

You can also checkout https://docs.edgen.co

@prabirshrestha
Copy link
Author

I can build but when I run completion api it crashes.

Assertion failed: (ne % ggml_blck_size(type) == 0), function ggml_row_size, file ggml.c, line 2126.
 ELIFECYCLE  Command failed.

Seems like I was able to run v0.1.2 but started to crash since v0.1.3.

@opeolluwa
Copy link
Contributor

That's after the instructions above, correct?

@prabirshrestha
Copy link
Author

yes that is after the instructions. Uninstall all rust toolchains too. Also probably wroth adding this line to the doc too in case you have multiple toolchains.

rustup override set beta-2023-11-21

One thing I did was add this to my profile after brew install llvm to get over that error.

export PATH="/opt/homebrew/opt/llvm/bin:$PATH"

If I remove it I will get the same error as you do.

@opeolluwa
Copy link
Contributor

Lemme get this, the application now builds, that's after you've installed llvm and removed the existing rust tool chain

@prabirshrestha
Copy link
Author

The application builds once I run the following commands. Rust toolchains didn't have much impact as I was able to build and run for other toolchains too. Just to be sure I removed all the rust toolchains and only had beta-2023-11-21.

brew install llvm
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"

I'm also able to run the edgen app and can see it in the taskbar and the window open. But as soon as I make the http://localhost:33322/v1/chat/completions request it crashes.

@opeolluwa
Copy link
Contributor

Ok good! 👍
We're making some progress. Let's pickup again tomorrow, It's midnight my time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants