New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama3 model not generating/taking far too long to generate simple answer. Anyone else? #505
Comments
Hi there! nice debugging. I have also tried llama3 without good results, but it works ok for me in terms of speed (rtx 3090). Just a quick thing, I don't think that is in your case, in your
instead of |
Thanks for reaching out! Changing the variables to |
Maybe you could use the system manager (I don't recall the name, but the thing that appears on win11 when you ctrl-alt-del) and there checking the vram usage comparing llama2 and llama3. |
I checked the RAM usage and they are both the same when using Llama2 and Llama3. Llama3 screenshot of the task manager was in the original posting, and here is what Llama2 looks like at runtime: Pretty similar, I just get an answer wayyyy faster with Llama2 than 3. I'll leave this here in case others have issues with similar hardware, but thanks for the suggestions so far! |
Did you ever get any response back from the llama3 model setup? I'm having a similar experience, only I've never been patient enough to get an answer if one was coming. I'm using the langchain community Ollama library, but everything else is the same:
Whether with the base model, or one with the ModelFile modification applied, I spin forever after printing "> Entering new CrewAgentExecutor chain..." to the terminal. I did get it to work once with llama2 (plus ModelFile), but tried again and am now having similar problems there and have been unable to reproduce my success. When I CTRL+C kill the script, the stack trace shows it is always stuck waiting on line 186 of As a final datapoint, in another simple test script I am able to interact with the llama3 model and it is fast and responsive, both to blocking invoke or streaming interactions.
|
so we just need to wait for the dev team to fix it for running llama3? |
Also completely possible I have something borked with my setup somehow since I am pretty new to all this. If anybody has it working with llama3 and can post how they set it up that would be great. |
I think I found a solution that is working for me and wanted to post and update. By using the 'dolphin-llama3' model which is also available through Ollama, and applying the ModelFile with the extra parameters (targetted at dolphin-llama3), I appear to be getting crewai fired up and returning reliably on at least trivial local example. |
Quick update, I got it working using the langchain_community chat model setting the As per ollama/ollama#3760
It works with the default Update: works on langchain_community Ollama llm as well. |
Not sure what the issue is with the ollama llama3 model and CrewAI, it seems to have a hard time stopping generation.
|
With the introduction of the Llama3 model, I wanted to start testing it out with CrewAI! I recreated the following simple program from the documentation (Side note, should add the now mandatory parameter of
expected_output
to theTask
object in the documentation):.env file looks like this:
Modelfile
...and I ran this powershell script ahead of time:
Running the code, I don't get a generated answer after 15 minutes of running it:
(GPU isn't being fully utilized compared to the CPU, but leaving that for now).
When I run the
llama2
model however, by changing all the necessary variables fromllama3
tollama2
in all files, I get the answer in seconds:Is this really because the llama3 model is so much bigger than llama2, or does anyone else have this issue yet?
The text was updated successfully, but these errors were encountered: