Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FunctionTool] Use docstring_parser to infer description for FunctionTool #12864

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

BeautyyuYanli
Copy link
Contributor

@BeautyyuYanli BeautyyuYanli commented Apr 16, 2024

Description

In this PR, I updated the llama_index.core.tools package to use docstring_parser to parse a function.__doc__ , so that to make the FunctionTool to comply with the general format (openai format).

Originally, the description provided by FunctionTool is simply __doc__ and signature(). Using the docstring_parser, we can parse the __doc__ to get the function description and per param's description, so that to make the FunctionTool.metadata passed to the LLM API the same as the OpenAI's example.

For the OpenAI's example,

def get_current_weather(
    location: str, unit: Literal["celsius", "fahrenheit"] = "celcius"
):
    """Get the current weather in a given location

    Args:
        location (str): The city and state, e.g. San Francisco, CA
    """
    pass

print(
    json.dumps(FunctionTool.from_defaults(fn=get_current_weather).metadata.to_openai_tool(), indent=2),
)

diff between old and new:

     "type": "function",
     "function": {
         "name": "get_current_weather",
-        "description": "get_current_weather(location: str, unit: Literal['celsius', 'fahrenheit'] = 'celcius')\nGet the current weather in a given location\n\n        Args:\n            location (str): The city and state, e.g. San Francisco, CA\n        ",
+        "description": "Get the current weather in a given location\n",
         "parameters": {
-            "type": "object",
             "properties": {
                 "location": {
+                    "description": "The city and state, e.g. San Francisco, CA",
                     "title": "Location",
                     "type": "string"
                 },
                 "unit": {
-                    "title": "Unit",
                     "default": "celcius",
                     "enum": [
                         "celsius",
                         "fahrenheit"
                     ],
+                    "title": "Unit",
                     "type": "string"
                 }
             },
             "required": [
                 "location"
-            ]
+            ],
+            "type": "object"
         }
     }
 }

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Added new unit/integration tests
  • Added new notebook (that tests end-to-end)
  • I stared at the code and made sure it makes sense

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 16, 2024
@BeautyyuYanli BeautyyuYanli changed the title Use docstring_parser to infer description for tool [FunctionTool] Use docstring_parser to infer description for FunctionTool Apr 16, 2024
@logan-markewich
Copy link
Collaborator

It was already openai format wasn't it? I see your diff has changed more than just the docstring. What is the issue with the original code?

It works fine for me

from llama_index.core.tools import FunctionTool
from llama_index.agent.openai import OpenAIAgent

from typing import Literal


def get_current_weather(
    location: str, unit: Literal["celsius", "fahrenheit"] = "celcius"
) -> str:
    """Get the current weather in a given location

    Args:
        location (str): The city and state, e.g. San Francisco, CA
    """
    return "-10C"

tool = FunctionTool.from_defaults(fn=get_current_weather)

agent = OpenAIAgent.from_tools([tool], verbose=True)

response = agent.chat("What is the current weather in Toronto, Canada?")

print(str(response))

@logan-markewich
Copy link
Collaborator

Calling with current code

Added user message to memory: What is the current weather in Toronto, Canada?
=== Calling Function ===
Calling function: get_current_weather with args: {"location":"Toronto, Canada","unit":"celsius"}
Got output: -10C
========================

Calling with this PR

Added user message to memory: What is the current weather in Toronto, Canada?
=== Calling Function ===
Calling function: get_current_weather with args: {"location":"Toronto, Canada"}
Got output: -10C
========================

@BeautyyuYanli
Copy link
Contributor Author

@logan-markewich The change is focused on the descriptions. The original code just wrote the function description with docstring + function signature, and no description inside the params. While the OpenAI example in their document wrote the function description and params description (which is the JSON schema way). I believe the change can make the LLM perform better in complex tasks because of fewer ambiguous and the same format as their training data (although no difference in the simple get_weather task)

@wey-gu
Copy link
Contributor

wey-gu commented Apr 17, 2024

Calling with current code

Added user message to memory: What is the current weather in Toronto, Canada?
=== Calling Function ===
Calling function: get_current_weather with args: {"location":"Toronto, Canada","unit":"celsius"}
Got output: -10C
========================

Calling with this PR

Added user message to memory: What is the current weather in Toronto, Canada?
=== Calling Function ===
Calling function: get_current_weather with args: {"location":"Toronto, Canada"}
Got output: -10C
========================

replying to @logan-markewich ♥️:

Seems legit(better now?) as unit is an enum with the default value(celsius), thus omitting it is equivalent to "unit":"celsius"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants