Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not returning all the required entities defined in the schema #113

Open
shima-khoshraftar opened this issue Mar 27, 2024 · 8 comments
Open
Assignees

Comments

@shima-khoshraftar
Copy link

Hi,
I faced an issue with langchain-extract. I defined an schema with some required entities(not optional), for instance:

class ExtractedValues(BaseModel):
syndication_agent: str=Field(description='who is the syndication agent?')
agreement_date: str=Field(description='what is the agreement date?')
administrative_agent: str=Field(description='what is the administrative agent?')

then ran the following lines(after lunching the app):
runnable = RemoteRunnable("http://localhost:8000/extract_text/")
response = runnable.invoke({"text": text, "schema": ExtractedValues.schema()})

However, the response does not contain all the entities defined in the ExtractedValues. Have you ever faced this issue? I am wondering if you can help me with that. Thanks.

@eyurtsev
Copy link
Collaborator

What is it returning?

@shima-khoshraftar
Copy link
Author

instead of returning three entities, it returns two for instance or one.

@eyurtsev
Copy link
Collaborator

Do you mean entity or attribute? Usually one would refer to the ExtractedValues as an entity, and its attributes (e.g., syndication_agent) as an attribute.

Are you observing that you're getting instances of extracted values with some attributes missing? (e.g., for syndication_agent?)

@shima-khoshraftar
Copy link
Author

yes, some attributes (the way you define it) such as syndication_agent are missing sometime in the output. I'll put an example here shortly.

@shima-khoshraftar
Copy link
Author

For instance, this is the ExtractedValues class:

class ExtractedValues(BaseModel):
agreement_date: str=Field(description='What is the agreement_date?')
agreement_name: str=Field(description='What is the agreement_name?')
governing_law: str=Field(description='What is the governing_law?')
effective_date: str=Field(description='What is the effective_date?')
termination_date: str=Field(description='What is the termination_date ?')

this is print(ExtractedValues.schema()):

{'properties': {'agreement_date': {'description': 'What is the agreement_date?', 'title': 'agreement_date', 'type': 'string'}, 'agreement_name': {'description': 'What is the agreement_name?', 'title': 'agreement_name', 'type': 'string'}, 'governing_law': {'description': 'What is the governing_law?', 'title': 'governing_law', 'type': 'string'}, 'effective_date': {'description': 'What is the effective_date?', 'title': 'effective_date', 'type': 'string'}, 'termination_date': {'description': 'What is the termination_date ?', 'title': 'termination_date', 'type': 'string'}}, 'required': ['agreement_date', 'agreement_name', 'governing_law', 'effective_date', 'termination_date'], 'title': 'ExtractedValues', 'type': 'object'}

this is response = runnable.invoke({"text": text, "schema": ExtractedValues.schema()})
print(response)

{'data': [{'agreement_date': 'September 30, 2020', 'agreement_name': 'Restated Credit Facility Agreement', 'governing_law': 'not specified'}]}

missing effective_date and termination_date in the response.

@ccurme ccurme self-assigned this Mar 27, 2024
@ccurme
Copy link
Collaborator

ccurme commented Mar 27, 2024

Hi @shima-khoshraftar, thanks for flagging this. I'm having trouble reproducing the case where the attributes are not present at all. Do you think you could create a minimal example?

fwiw I am finding that OpenAI will occasionally return null for required fields-- e.g., a required integer, as in the below example:

from pydantic import BaseModel, Field

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI


openai_function_schema = {
  "type": "function",
  "function": {
    "name": "extractor",
    "description": "Extract information matching the given schema.",
    "parameters": {
      "type": "object",
      "properties": {
        "data": {
          "type": "array",
          "items": {
            "title": "Person",
            "type": "object",
            "properties": {
              "age": {
                "title": "Age",
                "description": "The age of the person in years.",
                "type": "integer"
              },
              "name": {
                "title": "Name",
                "description": "The name of the person.",
                "type": "string"
              }
            },
            "required": [
              "age",
              "name"
            ]
          }
        }
      },
      "required": [
        "data"
      ]
    }
  }
}

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Please extract."),
        ("human", "Extract from the following text: {text}"),
    ]
)

model = ChatOpenAI(temperature=0, model_kwargs={"tools": [openai_function_schema]})
(prompt | model).invoke({"text": "My name is Chester."}).additional_kwargs["tool_calls"][0]["function"]["arguments"]
'{"data":[{"age":null,"name":"Chester"}]}'

@shima-khoshraftar
Copy link
Author

Hi @ccurme. Thanks for looking into this. So the issue is, it does not happen all the time as it is with llms. In fact, most of the time it works correctly. But because it can happen, it needs some checking (maybe I can myself do it as a post processing). Not sure if I even send an example you would face it too but I will send one. Thanks.
-Note: I am using Azure openAI (gpt-35-trubo).

@eyurtsev
Copy link
Collaborator

You can definitely do a post processing step on the client side using the pydantic schema that you defined! (check pydantic docs, but it shouldn't be too hard to validate client side).

You can likely mitigate some of the issues simply by providing examples. Examples tend to help a lot in improving performance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants