Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: jan fails to load local models due to mis-detects Nvidia GPU and/or fails to run inference with Vulcan #2888

Open
Andydna2 opened this issue May 11, 2024 · 3 comments
Labels
status: needs info This doesn't seem right, more information is requested type: bug Something isn't working

Comments

@Andydna2
Copy link

Describe the bug
I have tried stable and nighly version - with clean install and JAN fails to load models (standard settings) or fails to run inference on Vulcan (you cn observe model loading becuase RAM usage is high but no joy)

Steps to reproduce
Steps to reproduce the behavior:

  1. install Jan
  2. Use Lenovo laptop with AMD Ryzen 7735HS CPU

Expected behavior
it should at least report clear error messages and maybe suggest changing Settings ?

Environment details

  • Operating System: Windows 11
  • Jan Version: 0.4.12 nightly
  • Processor: AMD Ryzen 7 with build in iGPU 680M
  • RAM: 16GB
  • Any additional relevant hardware specifics: no built-in nvidia card

EXAMPLE 1: misdetects nvidia
Logs

2024-05-11T08:17:17.733Z [SPECS]::OS Version: Windows 10 Home
2024-05-11T08:17:17.734Z [SPECS]::OS Platform: win32
2024-05-11T08:17:17.734Z [SPECS]::OS Release: 10.0.22631
2024-05-11T08:17:17.734Z [APP]::{"notify":true,"run_mode":"gpu","nvidia_driver":{"exist":false,"version":""},"cuda":{"exist":true,"version":"11"},"gpus":[],"gpu_highest_vram":"","gpus_in_use":[""],"is_initial":false,"vulkan":false}
2024-05-11T08:29:38.408Z [NITRO]::Debug: Request to kill Nitro
2024-05-11T08:29:38.408Z [NITRO]::CPU information - 9
2024-05-11T08:29:38.459Z [NITRO]::Debug: Nitro process is terminated
2024-05-11T08:29:38.460Z [NITRO]::Debug: Spawning Nitro subprocess...
2024-05-11T08:29:38.461Z [NITRO]::Debug: Spawn nitro at path: C:\Users\vojevoda\jan\extensions\@janhq\inference-nitro-extension\dist\bin\win-cuda-11-7\nitro.exe, and args: 1,127.0.0.1,3928
2024-05-11T08:29:45.900Z [NITRO]::Debug: Nitro exited with code: 3221225781
2024-05-11T08:29:45.900Z [NITRO]::Error: child process exited with code 3221225781
2024-05-11T08:33:57.847Z [SPECS]::Version: 0.4.12-413
2024-05-11T08:33:57.848Z [SPECS]::CPUs: [{"model":"AMD Ryzen 7 7735HS with Radeon 

EXAMPLE 2: fails to run, after loading when using VULCAN

{"timestamp":1715416730,"level":"INFO","function":"LoadModelImpl","line":708,"message":"system info","n_threads":9,"total_threads":16,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | "}

2024-05-11T08:38:50.484Z [NITRO]::Error: llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from C:\Users\vojevoda\jan\models\llama3-8b-instruct\Meta-Llama-3-8B-Instruct-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct-imatrix
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 15
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2

2024-05-11T08:38:50.518Z [NITRO]::Error: llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...

2024-05-11T08:38:50.533Z [NITRO]::Error: llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...

2024-05-11T08:38:50.606Z [NITRO]::Error: llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 128001
llama_model_loader: - kv  19:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  20:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors

2024-05-11T08:38:51.577Z [NITRO]::Error: llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.58 GiB (4.89 BPW) 
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct-imatrix
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'

2024-05-11T08:38:51.645Z [NITRO]::Error: ggml_vulkan: Found 1 Vulkan devices:

2024-05-11T08:38:51.647Z [NITRO]::Error: Vulkan0: AMD Radeon(TM) 680M | uma: 1 | fp16: 1 | warp size: 64

2024-05-11T08:38:51.682Z [NITRO]::Error: llm_load_tensors: ggml ctx size =    0.22 MiB

2024-05-11T08:38:59.410Z [NITRO]::Error: llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =   281.81 MiB
llm_load_tensors:    Vulkan0 buffer size =  4403.49 MiB

app.log

@Andydna2 Andydna2 added the type: bug Something isn't working label May 11, 2024
@Andydna2 Andydna2 changed the title bug: jan fails to mis-detects Nvidia GPU and fails load local models bug: jan fails to load local models due to mis-detects Nvidia GPU and/or fails to run inference with Vulcan May 11, 2024
@Van-QA
Copy link
Contributor

Van-QA commented May 11, 2024

hi @Andydna2, From your log, the Nvidia driver is false, can you try this: https://jan.ai/docs/troubleshooting#troubleshooting-nvidia-gpu to install the Nvidia driver and see if it helps?

@Andydna2
Copy link
Author

hi @Andydna2, From your log, the Nvidia driver is false, can you try this: https://jan.ai/docs/troubleshooting#troubleshooting-nvidia-gpu to install the Nvidia driver and see if it helps?

I am not sure what you mean by "Nvidia driver is false" my laptop does NOT have dedicated graphics card, only AMD iGPU - so I cant really install Nvidia drivers.
I seems to me that JAN/nitro hardware detection is not working properly, in my case.
Perhaps there Is there some override switch in the JASON settings ?

@Van-QA
Copy link
Contributor

Van-QA commented May 14, 2024

hi @Andydna2, when you try to turn on the Vulkan, does it list your AMD in the dropdown?
image

On the other hand, Jan can run with CPU and it should be working just fine, would you mind trying it as well? 🙏

@Van-QA Van-QA added the status: needs info This doesn't seem right, more information is requested label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs info This doesn't seem right, more information is requested type: bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants