Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama3 model not generating/taking far too long to generate simple answer. Anyone else? #505

Open
windowshopr opened this issue Apr 24, 2024 · 10 comments

Comments

@windowshopr
Copy link

With the introduction of the Llama3 model, I wanted to start testing it out with CrewAI! I recreated the following simple program from the documentation (Side note, should add the now mandatory parameter of expected_output to the Task object in the documentation):

# Windows 10
# Python 3.11
# Device 0: NVIDIA GeForce GTX 1050 Ti, compute capability 6.1, VMM: yes
# 6 Core, 12 Thread AMD CPU
# 48 Gb of RAM

from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
import os
os.environ["OPENAI_API_KEY"] = "NA"

llm = ChatOpenAI(
    model = "crewai-llama3",
    base_url = "http://localhost:11434/v1")

general_agent = Agent(role = "Math Professor",
                      goal = """Provide the solution to the students that are asking mathematical questions and give them the answer.""",
                      backstory = """You are an excellent math professor that likes to solve math questions in a way that everyone can understand your solution""",
                      allow_delegation = False,
                      verbose = True,
                      llm = llm)
task = Task (description="""what is 3 + 5""",
             agent = general_agent,
             expected_output = "The correct answer to my question"
             )

crew = Crew(
            agents=[general_agent],
            tasks=[task],
            verbose=2
        )

result = crew.kickoff()

print(result)

.env file looks like this:

OPENAI_API_BASE='http://localhost:11434/v1'
OPENAI_MODEL_NAME='llama3' #'openhermes'  # Adjust based on available model
OPENAI_API_KEY=''

Modelfile

FROM llama3

# Set parameters

PARAMETER temperature 0.2
PARAMETER stop Result

# Sets a custom system message to specify the behavior of the chat assistant

# Leaving it blank for now.

SYSTEM """"""

...and I ran this powershell script ahead of time:

# Variables
$model_name = "llama3"
$custom_model_name = "crewai-llama3"
$modelfile_path = "I:\nasty\Python_Projects\LLM\CrewAI\Modelfile"

# Get the base model
ollama pull $model_name

# Create the model file
ollama create $custom_model_name -f "$modelfile_path"

Running the code, I don't get a generated answer after 15 minutes of running it:

image

(GPU isn't being fully utilized compared to the CPU, but leaving that for now).

When I run the llama2 model however, by changing all the necessary variables from llama3 to llama2 in all files, I get the answer in seconds:

image

Is this really because the llama3 model is so much bigger than llama2, or does anyone else have this issue yet?

@kyuumeitai
Copy link

Hi there! nice debugging. I have also tried llama3 without good results, but it works ok for me in terms of speed (rtx 3090).

Just a quick thing, I don't think that is in your case, in your .env, you have:

OPENAI_MODEL_NAME='llama3' #'openhermes'  # Adjust based on available model

instead of crewai-llama3

@windowshopr
Copy link
Author

Thanks for reaching out!

Changing the variables to llama3 from/to crewai-llama3 results in the same thing. I think we're just calling the llama3 model crewai-llama3 for easier model name separation, but issue persists. My GPU isn't the greatest so that's likely the issue, however with such a big difference between the llama2 and llama3 model it sure seems suspicious...

@kyuumeitai
Copy link

Maybe you could use the system manager (I don't recall the name, but the thing that appears on win11 when you ctrl-alt-del) and there checking the vram usage comparing llama2 and llama3.

@windowshopr
Copy link
Author

I checked the RAM usage and they are both the same when using Llama2 and Llama3. Llama3 screenshot of the task manager was in the original posting, and here is what Llama2 looks like at runtime:

image

Pretty similar, I just get an answer wayyyy faster with Llama2 than 3. I'll leave this here in case others have issues with similar hardware, but thanks for the suggestions so far!

@JeremyJass
Copy link

Did you ever get any response back from the llama3 model setup? I'm having a similar experience, only I've never been patient enough to get an answer if one was coming.

I'm using the langchain community Ollama library, but everything else is the same:

from crewai import Agent, Task, Crew, Process
 
llm = Ollama(model= 'llama3')

Whether with the base model, or one with the ModelFile modification applied, I spin forever after printing "> Entering new CrewAgentExecutor chain..." to the terminal. I did get it to work once with llama2 (plus ModelFile), but tried again and am now having similar problems there and have been unable to reproduce my success.

When I CTRL+C kill the script, the stack trace shows it is always stuck waiting on line 186 of agents/executor.py waiting for a call to self.agent.plan to return. Adding a print statement, I can see that the first argument intermediate_steps passed into agent.plan is an empty array, but am not familiar enough with the base AgentExecutor class to know if that is a problem or not.

As a final datapoint, in another simple test script I am able to interact with the llama3 model and it is fast and responsive, both to blocking invoke or streaming interactions.


llm = Ollama(model= 'llama3')

query = "Tell me a joke"

response = llm.invoke(query)
print(response)

for chunks in llm.stream(query):
    print(chunks)

@thanayut1750
Copy link

so we just need to wait for the dev team to fix it for running llama3?

@JeremyJass
Copy link

Also completely possible I have something borked with my setup somehow since I am pretty new to all this. If anybody has it working with llama3 and can post how they set it up that would be great.

@JeremyJass
Copy link

I think I found a solution that is working for me and wanted to post and update.

By using the 'dolphin-llama3' model which is also available through Ollama, and applying the ModelFile with the extra parameters (targetted at dolphin-llama3), I appear to be getting crewai fired up and returning reliably on at least trivial local example.

@jacoverster
Copy link

jacoverster commented Apr 26, 2024

Quick update, I got it working using the langchain_community chat model setting the num_predict param.

As per ollama/ollama#3760

from langchain_community.chat_models.ollama import ChatOllama

ChatOllama(model="llama3", temperature=0.7, num_predict=128)

It works with the default llama3 or crewai-llama3 models as far as I can see but both models keeps adding "<|eot_id|><|start_header_id|>assistant<|end_header_id|>" to the answer.

Update: works on langchain_community Ollama llm as well.

@danielgen
Copy link

danielgen commented May 4, 2024

Not sure what the issue is with the ollama llama3 model and CrewAI, it seems to have a hard time stopping generation.

  • Llama3 and Llama2 work correctly from Ollama CLI
  • Llama2 works correctly in CrewAI with modelfile as per CrewAI documentation i.e. Parameter stop Result
  • Llama3 does not work correctly in CrewAI with parameter stop Result
  • I also tried using the Ollama modelfile in CrewAI but it does not work:
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama3:latest

FROM llama3:latest
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
PARAMETER num_keep 24
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants