Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault Windows 11 Docker #254

Open
jak6jak opened this issue Mar 24, 2023 · 6 comments
Open

Segmentation fault Windows 11 Docker #254

jak6jak opened this issue Mar 24, 2023 · 6 comments

Comments

@jak6jak
Copy link

jak6jak commented Mar 24, 2023

I tried installing dalai with docker on windows. Currently I am getting the following error when I try generating a response with debug mode on:

root@7788cdbedf9c:~/dalai/alpaca# /root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
> 
> ### Instruction:
> >PROMPT
> 
> ### Response:
> "
main: seed = 1679656530
llama_model_load: loading model from 'models/30B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
Segmentation fault
root@7788cdbedf9c:~/dalai/alpaca# exit
exit

Looking at the llama.cpp project it seems that they have tried to fix some segmentation problems but where unsuccessful. Perhaps this is the issue I am facing but I do not know. ggerganov/llama.cpp@3cd8dde

Any tips on how to debug this or to get a better error would be appreciated.

@christopherorea
Copy link

I do have the exact same problem.
I tried running it in the terminal via docker and clonning alpaca.cpp and run make chat but without success. If I know something I would post here

mirroredkube pushed a commit to mirroredkube/dalai that referenced this issue Mar 26, 2023
This causes long prompts to parse very slowly.
@FrancescoGrazioso
Copy link

Just downloaded the repo and installed the 30B model, having the same issue.
Here's the debug output:

`/root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

Instruction:

PROMPT

Response:

"
exit
root@81743ba9c2e2:~/dalai/alpaca# /root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

Instruction:

PROMPT

Response:

"
main: seed = 1680109480
llama_model_load: loading model from 'models/30B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
Segmentation fault
root@81743ba9c2e2:~/dalai/alpaca# exit
exit`

@glozachmeur
Copy link

glozachmeur commented Mar 30, 2023

I also have this issue with alpaca 30B and llama 30B, exactly the same error (but the ggml ctx size size is about 21000MB for me)

I have 32Go RAM, docker seems to consume a lot of it sometimes (via the vmmem process) and so I sometimes don't have the 22go needed, but when I have enough RAM I still can't run the model...

So I bet 32Go of ram in not enough for running the 30B model using docker ? 🤔 How much do you have ?

@toolchild
Copy link

Here i described my experience running models on Windows 10
#330 (comment)

@christopherorea
Copy link

I have the assumption that the issue comes from the fact that this models requires a lot of RAM in your machine. Can anybody confirm or dismiss this? I believe when the model is loaded it is loaded in the RAM, that is the reason it breaks.

@pratyushtiwary
Copy link

In my case the context size was causing this issue, I fixed it by adding new config to the UI which allows me to play with context size.

I was using 6 gb ram server to try it, in my case context size below 1024 seems to work without any errors.

PR for the same: #424

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants