selects too many cores by default on orange pi 5 (2x slower) #7176

calculatortamer · 2024-05-09T12:00:22Z

help, i'm on orange pi 5 and i use phi-3 and now it slow :'((((( T_T 😢 😿 😭

model is from PrunaAI/Phi-3-mini-4k-instruct-GGUF-Imatrix-smashed on hugging face 🤗

./main -m models/Phi-3-mini-4k-instruct.IQ4_XS.gguf -s 1 -p "Hi! My name is"

(it should give something like: Hi! My name is Emma, and I've been working as a full-time employee for the past five years at a marketing agency. My job involves managing campaigns, coordinating with clients, and overseeing the creative team. I have a passion for creativity and enjoy the fast-paced environment of my workplace.)

b2750:
prompt time: 11t/s
eval time: 7t/s - 7.20t/s

b2826:
prompt time: 4.40t/s
eval time: 3.84t/s

after bisecting, i found b2787 slowing it down very significantly (2x slower!)

b2785
prompt time: 11.19 tokens per second
eval time: 6.93 tokens per second

b2787
prompt time: 4.39 tokens per second
eval time: 3.81 tokens per second

update: i found the problem! after reading the commit message i realised i had to check htop for the core used

llama.cpp selects too many cores!! adding "-t 4" makes it go fast again (eval=6.92 tokens per second; prompt=10.77 tokens per second)

rk3588s has 4 BIG and 4 little cores so only choosing 4 same cores seems to be the ideal thing

slaren · 2024-05-09T12:27:56Z

You might get better performance with all the 8 cores with #6915.

calculatortamer · 2024-05-09T12:36:58Z

You might get better performance with all the 8 cores with #6915.

that's so cool! thanks for letting me know

i give my benchmarks with that branch (kunnis:MMThreadingPerfChange) :
-t 4
prompt: 11.22 tokens per second
eval: 7.58 tokens per second
(faster than before which was 7t/s, 7.2t/s at best)

-t 8
prompt: 12.65 tokens per second
eval: 8.64 tokens per second

resulting text is still the exact same

should i close the issue or wait until it gets merged before closing it?

calculatortamer · 2024-05-15T18:00:17Z

#6915 merged

calculatortamer added the bug-unconfirmed label May 9, 2024

calculatortamer changed the title ~~huge perf regression orange pi 5 + phi-3 (2x slower)~~ llama.cpp selects too many cores by default on orange pi 5 (2x slower) May 9, 2024

calculatortamer changed the title ~~llama.cpp selects too many cores by default on orange pi 5 (2x slower)~~ selects too many cores by default on orange pi 5 (2x slower) May 9, 2024

calculatortamer closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

selects too many cores by default on orange pi 5 (2x slower) #7176

selects too many cores by default on orange pi 5 (2x slower) #7176

calculatortamer commented May 9, 2024 •

edited

slaren commented May 9, 2024

calculatortamer commented May 9, 2024 •

edited

calculatortamer commented May 15, 2024

selects too many cores by default on orange pi 5 (2x slower) #7176

selects too many cores by default on orange pi 5 (2x slower) #7176

Comments

calculatortamer commented May 9, 2024 • edited

slaren commented May 9, 2024

calculatortamer commented May 9, 2024 • edited

calculatortamer commented May 15, 2024

calculatortamer commented May 9, 2024 •

edited

calculatortamer commented May 9, 2024 •

edited