-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llamafile as LLM server. #277
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Unfortunately koboldcpp with cuda crashes on my pc because my processor doesn't support avx2, while the other "blas" are too slow. So as an alternative i use llamafile, is working nice and smart, is very light and very performing on my 3060 with 12gb. The only problem is that every time I have to start a conversation, in order for the llm to generate the response, I have to briefly "alt+tab" to "exit and re-enter the game" so that llamafile generates the response and it triggers the loop with speech, it also works for multiple comments, but then after it asks a new question, I have to "alt+tab" again to trigger the llm. I was wondering what it could be and if there is a way to overcome this problem.
The text was updated successfully, but these errors were encountered: