New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
selects too many cores by default on orange pi 5 (2x slower) #7176
Comments
You might get better performance with all the 8 cores with #6915. |
that's so cool! thanks for letting me know i give my benchmarks with that branch (kunnis:MMThreadingPerfChange) : -t 8 resulting text is still the exact same should i close the issue or wait until it gets merged before closing it? |
#6915 merged |
help, i'm on orange pi 5 and i use phi-3 and now it slow :'((((( T_T 馃槩 馃樋 馃槶
model is from PrunaAI/Phi-3-mini-4k-instruct-GGUF-Imatrix-smashed on hugging face 馃
./main -m models/Phi-3-mini-4k-instruct.IQ4_XS.gguf -s 1 -p "Hi! My name is"
(it should give something like: Hi! My name is Emma, and I've been working as a full-time employee for the past five years at a marketing agency. My job involves managing campaigns, coordinating with clients, and overseeing the creative team. I have a passion for creativity and enjoy the fast-paced environment of my workplace.)
b2750:
prompt time: 11t/s
eval time: 7t/s - 7.20t/s
b2826:
prompt time: 4.40t/s
eval time: 3.84t/s
after bisecting, i found b2787 slowing it down very significantly (2x slower!)
b2785
prompt time: 11.19 tokens per second
eval time: 6.93 tokens per second
b2787
prompt time: 4.39 tokens per second
eval time: 3.81 tokens per second
update: i found the problem! after reading the commit message i realised i had to check htop for the core used
llama.cpp selects too many cores!! adding "-t 4" makes it go fast again (eval=6.92 tokens per second; prompt=10.77 tokens per second)
rk3588s has 4 BIG and 4 little cores so only choosing 4 same cores seems to be the ideal thing
The text was updated successfully, but these errors were encountered: