Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

selects too many cores by default on orange pi 5 (2x slower) #7176

Closed
calculatortamer opened this issue May 9, 2024 · 3 comments
Closed

Comments

@calculatortamer
Copy link

calculatortamer commented May 9, 2024

help, i'm on orange pi 5 and i use phi-3 and now it slow :'((((( T_T 馃槩 馃樋 馃槶

model is from PrunaAI/Phi-3-mini-4k-instruct-GGUF-Imatrix-smashed on hugging face 馃

./main -m models/Phi-3-mini-4k-instruct.IQ4_XS.gguf -s 1 -p "Hi! My name is"

(it should give something like: Hi! My name is Emma, and I've been working as a full-time employee for the past five years at a marketing agency. My job involves managing campaigns, coordinating with clients, and overseeing the creative team. I have a passion for creativity and enjoy the fast-paced environment of my workplace.)

b2750:
prompt time: 11t/s
eval time: 7t/s - 7.20t/s

b2826:
prompt time: 4.40t/s
eval time: 3.84t/s

after bisecting, i found b2787 slowing it down very significantly (2x slower!)

b2785
prompt time: 11.19 tokens per second
eval time: 6.93 tokens per second

b2787
prompt time: 4.39 tokens per second
eval time: 3.81 tokens per second

update: i found the problem! after reading the commit message i realised i had to check htop for the core used

llama.cpp selects too many cores!! adding "-t 4" makes it go fast again (eval=6.92 tokens per second; prompt=10.77 tokens per second)

rk3588s has 4 BIG and 4 little cores so only choosing 4 same cores seems to be the ideal thing

@calculatortamer calculatortamer changed the title huge perf regression orange pi 5 + phi-3 (2x slower) llama.cpp selects too many cores by default on orange pi 5 (2x slower) May 9, 2024
@calculatortamer calculatortamer changed the title llama.cpp selects too many cores by default on orange pi 5 (2x slower) selects too many cores by default on orange pi 5 (2x slower) May 9, 2024
@slaren
Copy link
Collaborator

slaren commented May 9, 2024

You might get better performance with all the 8 cores with #6915.

@calculatortamer
Copy link
Author

calculatortamer commented May 9, 2024

You might get better performance with all the 8 cores with #6915.

that's so cool! thanks for letting me know

i give my benchmarks with that branch (kunnis:MMThreadingPerfChange) :
-t 4
prompt: 11.22 tokens per second
eval: 7.58 tokens per second
(faster than before which was 7t/s, 7.2t/s at best)

-t 8
prompt: 12.65 tokens per second
eval: 8.64 tokens per second

resulting text is still the exact same

should i close the issue or wait until it gets merged before closing it?

@calculatortamer
Copy link
Author

#6915 merged

@calculatortamer calculatortamer closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants