🐛 [Bug]: New install - response keeps repeating the last line #1182

DeadEnded · 2024-03-05T21:01:23Z

Bug description

I just pulled the image, spun up a container with default settings. I downloaded the Mistral-7B model, and left everything default. I've tried a few short questions, and the answer repeats the last line until I stop the container.

Steps to reproduce

Spin up new container with default settings (from repo)
Download Mistral-7B
Start a new chat and ask "what is the square root of nine"

Environment Information

Docker version: 25.0.3
OS: Ubuntu 22.04.4 LTS on kernel 5.15.0-97
CPU: AMD Ryzen 5 2400G
Broswer: Firefox version 123.0

Screenshots

Relevant log output

llm_load_print_meta: BOS token        = 1 '<s>'

llm_load_print_meta: EOS token        = 2 '</s>'

llm_load_print_meta: UNK token        = 0 '<unk>'

llm_load_print_meta: LF token         = 13 '<0x0A>'

llm_load_tensors: ggml ctx size =    0.11 MiB

llm_load_tensors: offloading 0 repeating layers to GPU

llm_load_tensors: offloaded 0/33 layers to GPU

llm_load_tensors:        CPU buffer size =  4165.37 MiB

...............................................................................................

llama_new_context_with_model: n_ctx      = 2153

llama_new_context_with_model: freq_base  = 10000.0

llama_new_context_with_model: freq_scale = 1

llama_kv_cache_init:        CPU KV buffer size =   269.13 MiB

llama_new_context_with_model: KV self size  =  269.12 MiB, K (f16):  134.56 MiB, V (f16):  134.56 MiB

llama_new_context_with_model:        CPU input buffer size   =    12.22 MiB

llama_new_context_with_model:        CPU compute buffer size =   174.42 MiB

llama_new_context_with_model: graph splits (measure): 1

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 

Model metadata: {'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '32768', 'general.name': 'mistralai_mistral-7b-v0.1', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '15'}

18:signal-handler (1709671894) Received SIGTERM scheduling shutdown...

Received termination signal!

++ _term

++ echo 'Received termination signal!'

++ kill -TERM 18

++ kill -TERM 19

18:signal-handler (1709671894) Received SIGTERM scheduling shutdown...

18:signal-handler (1709671894) Received SIGTERM scheduling shutdown...

Confirmations

I'm running the latest version of the main branch.
I checked existing issues to see if this has already been described.

SolutionsKrezus · 2024-04-15T14:44:43Z

Hello, I have the same bug when using Mistral or Mixtral for text generation. It keeps repeating the last sentance over and over till I restart the container. I tried increasing the repeat penalty but it does nothing.

fishscene · 2024-04-15T15:15:38Z

I've noticed this for most, if not all models I can test. This bug essentially makes serge useless.
Update
Reverting to "ghcr.io/serge-chat/serge:0.8.2" appears to vastly improve or eliminate the repeating issue altogether. Still testing.

gaby · 2024-04-16T00:01:57Z

This is probably a bug in llama-cpp-python. I will update it this week and do a new release.

Which specific model are you all using? @SolutionsKrezus @fishscene

SolutionsKrezus · 2024-04-16T15:13:36Z

I'm currently using Mistral 7B and Mixtral @gaby
I reverted to 0.8.0 and it works like a charm

fishscene · 2024-04-16T16:50:27Z

This is probably a bug in llama-cpp-python. I will update it this week and do a new release.

Which specific model are you all using? @SolutionsKrezus @fishscene

Apologies, I’m at work at the moment.
All models I tested were affected to some degree. Some more than others.

Off the top of my head:
All current mixtral models, at least 2 mistral models, neural chat, one of the medical ones, definitely a few more as well. I did not test anything above 13b as those are beyond my hardware.

I would see random replies marked/flagged as code snippets… and if the model started repeating itself, that was the end of anything useful as all subsequent replies would only repeat.

Of all the testing I did, getting 10 coherent replies was a major milestone- and even then, sometimes it took multiple re-prompting (delete my query and ask it slightly differently) to get to 10. A couple models started spewing nonsense and repeats on the very first response.

All this to say, testing should be very easy to do.
When I reverted to previous serge release, I immediately saw improvement.

Curious though. OP is using Ryzen- so am I: Ryzen 1700x, 32GB RAM, no CUDA GPU. (NVIDIA T400 I think). Using CPU for AI.

Maybe this is isolated to Ryzen CPU’s?

Another behavior to note:
When asking some censored models a question, they straight up have no reply at all. No detectable CPU was used either. It was like some pre-AI function was like “nope” and didn’t pass along my query to the AI model itself. There’s a name for this pre-process, but it escapes me at the moment. Not sure if it is a clue either.

SolutionsKrezus · 2024-04-16T16:55:42Z

I don't think it is a Ryzen-related issue @fishscene
I have the same problem with a Intel Xeon D-1540 with 32GB RAM and no GPU.

JuniperChris929 · 2024-04-29T23:06:38Z

Same issue here. This pretty much renders the software completely useless :(

DeadEnded added the ☢️ Bug label Mar 5, 2024

serge-chat deleted a comment from DeadEnded Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 [Bug]: New install - response keeps repeating the last line #1182

🐛 [Bug]: New install - response keeps repeating the last line #1182

DeadEnded commented Mar 5, 2024

SolutionsKrezus commented Apr 15, 2024

fishscene commented Apr 15, 2024 •

edited

gaby commented Apr 16, 2024

SolutionsKrezus commented Apr 16, 2024

fishscene commented Apr 16, 2024

SolutionsKrezus commented Apr 16, 2024

JuniperChris929 commented Apr 29, 2024

🐛 [Bug]: New install - response keeps repeating the last line #1182

🐛 [Bug]: New install - response keeps repeating the last line #1182

Comments

DeadEnded commented Mar 5, 2024

Bug description

Steps to reproduce

Environment Information

Screenshots

Relevant log output

Confirmations

SolutionsKrezus commented Apr 15, 2024

fishscene commented Apr 15, 2024 • edited

gaby commented Apr 16, 2024

SolutionsKrezus commented Apr 16, 2024

fishscene commented Apr 16, 2024

SolutionsKrezus commented Apr 16, 2024

JuniperChris929 commented Apr 29, 2024

fishscene commented Apr 15, 2024 •

edited