New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run on koboldcpp... #513
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I copy the example python and run it. The only thing I change is comment the openai API and release local server setting:
it is a koboldcpp server, run with meta llama 3 70B Q4XS gguf model.
After I run the program, it get the answer:
yes, it stop after it generates 100 tokens every time. Clearly, the program didn't submit max_token parameter to koboldcpp server, so it didn't generate more than 100 tokens. Koboldcpp didn't have initialize parameter to config the response token in its initial setting, it is submitted in request, and I didn't find where I can change the code to add the max_token parameter in the request.
The text was updated successfully, but these errors were encountered: