Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Inconsistent token counting in AzureChatOpenAI Agent with Tools #510

Open
Phobos97 opened this issue Mar 7, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@Phobos97
Copy link

Phobos97 commented Mar 7, 2024

Issue you'd like to raise.

I am using an OpenAI agent with tools with an AzureChatOpenAI as ChatModel. This model can invoke a tool to search for additional context, this will often be 1k+ tokens in my use case. The issue being that these ToolMessage tokens are not counted in the LangSmith trace even though they are part of the context for the next agent step.

The inconsistent thing is, when setting model='gpt-35' on the AzureChatOpenAI constructor it DOES count the tokens (model is an optional parameter here). Both model='gpt-4' and model='gpt-35-turbo-16k' do not work. So with AzureChatOpenAI I can actually set azure_deployment='gpt-4' with model=gpt-35 in order to get the proper token count (but then my cost is way off).

I have created an example below, where the agent is prompted into using its CustomTool. When setting model_name = "gpt-4" the trace in LangSmith is ~55 tokens, even though it uses its tool which adds another ~350 tokens. When setting model_name = "gpt-35" the LangSmith trace will be ~400 tokens which is the expected behaviour (except I have non-matching deployment & model name which is obviously wrong but that would just be an user error).

TLDR; The ToolMessage in the AgentExecutor trace is sometimes counted in the token count and sometimes not.

See this example (does not include setting environment variables):

from typing import Type, Optional

import httpx
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableConfig
from langchain_core.tools import BaseTool
from langchain_openai import AzureChatOpenAI
from langsmith import Client
from pydantic.v1 import BaseModel, Field

from constants import LANGCHAIN_API_KEY, set_environment_variables
from utility.tracer import NewLangChainTracer

deployment_name = "gpt-4"

model_name = "gpt-4"  # does not work, does not count tokens from tool output
# model_name = "gpt-35"  # works, counts tokens from tool output, but with gpt-35 costs which are obviously wrong with a gpt-4 deployment

set_environment_variables()

http_client = httpx.Client(verify=False)
chat_model: AzureChatOpenAI = AzureChatOpenAI(azure_endpoint="https://localhost:8080/",
                                              model=model_name,
                                              azure_deployment=deployment_name, api_version="2023-08-01-preview",
                                              http_client=http_client,
                                              tags=["gpt-4-agent"], temperature=0, max_tokens=50)


class ToolInput(BaseModel):
    question: str = Field(description="Please input the word 'Birthday'.")


class CustomTool(BaseTool):
    name = "birthday_search_tool"
    description = "useful for when you need to know a users birthday."
    args_schema: Type[BaseModel] = ToolInput

    def _run(
            self, question: str, run_manager: Optional[CallbackManagerForToolRun] = None
    ) -> str:
        """Use the tool."""
        return "The users birthday is yesterday!!!!!!!!!!\n" * 50


agent_prompt = ChatPromptTemplate.from_messages([
    MessagesPlaceholder(variable_name="instructions"),
    MessagesPlaceholder(variable_name="input"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

tool = CustomTool()

agent = create_openai_tools_agent(chat_model, [tool], agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=[tool], verbose=True,
                               return_intermediate_steps=True)

instructions = ["You are a helpful assistant"]
test_input = ["What is my birthday?"]

tracer = NewLangChainTracer(
    project_name="reproduce-tracer-error",
    tags=["tag1"],
    client=Client(
        api_url="https://api.smith.langchain.com",
        api_key=LANGCHAIN_API_KEY
    )
)

config = RunnableConfig(
    callbacks=[tracer]
)

result = agent_executor.invoke(
    input={"input": test_input, "instructions": instructions},
    config=config
)
langchain 0.1.11
langchain-core 0.1.30
langchain-openai 0.0.8
langsmith 0.1.22

Image from LangSmith below with AzureChatOpenAI step with claimed 34 tokens while on the right it is obvious the tool added many more than 34 tokens to the context.

Screenshot 2024-03-07 at 19 58 28

Suggestion:

Agent Tool output that is added to the context should count towards the token count. Seems the code parsing ToolMessages only works with model=gpt-35 specified. Possibly related to the model name being optional on AzureChatOpenAI.

Thank you for your help.

@hinthornw
Copy link
Collaborator

Thanks for flagging, I'll coordinate with the team

@hinthornw hinthornw added the bug Something isn't working label Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants