Skip to content
This repository has been archived by the owner on Sep 30, 2023. It is now read-only.

OOM - Segmentation fault (core dumped) #50

Open
juzkev opened this issue Aug 14, 2023 · 3 comments
Open

OOM - Segmentation fault (core dumped) #50

juzkev opened this issue Aug 14, 2023 · 3 comments

Comments

@juzkev
Copy link

juzkev commented Aug 14, 2023

I'm using the following docker-compose file on ARM Ubuntu 22.04 Oracle cloud, running stablecode. I've tried starcode and Wizardcoder too, both yielded the same error.

docker-compose.yml:

version: '3.3'
services:
  turbopilot:
    container_name: turbopilot
    image: 'ghcr.io/ravenscroftj/turbopilot:nightly-8be7171573ddd3e1c2fc83c2576e4e40621adf31'
    environment:
        - THREADS=3
        - MODEL_TYPE=stablecode
        - MODEL=/models/stablecode-instruct-alpha-3b.ggmlv1.q4_0.bin
        # - MODEL_TYPE=starcoder
        # - MODEL=/models/santacoder-q4_0.bin
    volumes:
        - /home/ubuntu/appdata/turbopilot/models:/models
    # ports:
    #     - '18080:18080'
    networks:
      - swag
    labels:
      - swag=enable
networks:
  swag:
    name: swag
    external: true

logs:

[2023-08-14 01:55:04.526] [info] Initializing StableLM type model for 'stablecode' model type
[2023-08-14 01:55:04.527] [info] Attempt to load model from stablecode
load_model: loading model from '/models/stablecode-instruct-alpha-3b.ggmlv1.q4_0.bin' - please wait ...
load_model: n_vocab = 49152
load_model: n_ctx   = 4096
load_model: n_embd  = 2560
load_model: n_head  = 32
load_model: n_layer = 32
load_model: n_rot   = 20
load_model: par_res = 1
load_model: ftype   = 2002
load_model: qntvr   = 2
load_model: ggml ctx size = 4849.28 MB
load_model: memory_size =  1280.00 MB, n_mem = 131072
load_model: ................................................ done
load_model: model size =  1489.08 MB / num tensors = 388
[2023-08-14 01:55:05.713] [info] Loaded model in 1185.33ms
(2023-08-14 01:55:05) [INFO    ] Crow/1.0 server is running at http://0.0.0.0:18080 using 4 threads
(2023-08-14 01:55:05) [INFO    ] Call `app.loglevel(crow::LogLevel::Warning)` to hide Info level logs.
(2023-08-14 01:57:42) [INFO    ] Request: 172.20.0.4:59486 0xfffe5e5fb020 HTTP/1.1 POST /v1/engines/codegen/completions
Segmentation fault (core dumped)
@ravenscroftj
Copy link
Owner

ok thanks! How much memory is available on the system? Is it possible that it is just running out of RAM?

Also how long is the file that you're trying to autocomplete? Is it hundreds of lines long? Does the system work on a short or new file? There might be an issue with how much context is being fed through via vscode-fauxpilot

@juzkev
Copy link
Author

juzkev commented Aug 15, 2023

I believe it is highly unlikely that it was running out of memory. The system ahs 24GB ram and the test was simply ran on single line python file with def hello_world():

@quanha-0878
Copy link

i'm using MODEL="/models/santacoder-q4_0.bin" and have same issue

[2023-08-21 10:01:54.792] [info] Initializing Starcoder/Wizardcoder type model for 'starcoder' model type
[2023-08-21 10:01:54.792] [info] Attempt to load model from starcoder
load_model: loading model from '/models/santacoder-q4_0.bin'
load_model: n_vocab = 49280
load_model: n_ctx   = 2048
load_model: n_embd  = 2048
load_model: n_head  = 16
load_model: n_layer = 24
load_model: ftype   = 2002
load_model: qntvr   = 2
load_model: ggml ctx size = 1542.88 MB
load_model: memory size =   768.00 MB, n_mem = 49152

load_model: model size  =   774.73 MB
[2023-08-21 10:01:56.353] [info] Loaded model in 1561.73ms
(2023-08-21 10:01:56) [INFO    ] Crow/1.0 server is running at http://0.0.0.0:18080 using 2 threads
(2023-08-21 10:01:56) [INFO    ] Call `app.loglevel(crow::LogLevel::Warning)` to hide Info level logs.
(2023-08-21 10:03:55) [INFO    ] Request: 192.168.0.1:53945 0x7f27b5a5f040 HTTP/1.1 POST /v1/engines/codegen/completions
Segmentation fault (core dumped)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants