Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llamafile as LLM server. #277

Open
amonpaike opened this issue May 12, 2024 · 0 comments
Open

llamafile as LLM server. #277

amonpaike opened this issue May 12, 2024 · 0 comments

Comments

@amonpaike
Copy link

amonpaike commented May 12, 2024

Unfortunately koboldcpp with cuda crashes on my pc because my processor doesn't support avx2, while the other "blas" are too slow. So as an alternative i use llamafile, is working nice and smart, is very light and very performing on my 3060 with 12gb. The only problem is that every time I have to start a conversation, in order for the llm to generate the response, I have to briefly "alt+tab" to "exit and re-enter the game" so that llamafile generates the response and it triggers the loop with speech, it also works for multiple comments, but then after it asks a new question, I have to "alt+tab" again to trigger the llm. I was wondering what it could be and if there is a way to overcome this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant