Not returning all the required entities defined in the schema #113

shima-khoshraftar · 2024-03-27T00:34:38Z

Hi,
I faced an issue with langchain-extract. I defined an schema with some required entities(not optional), for instance:

class ExtractedValues(BaseModel):
syndication_agent: str=Field(description='who is the syndication agent?')
agreement_date: str=Field(description='what is the agreement date?')
administrative_agent: str=Field(description='what is the administrative agent?')

then ran the following lines(after lunching the app):
runnable = RemoteRunnable("http://localhost:8000/extract_text/")
response = runnable.invoke({"text": text, "schema": ExtractedValues.schema()})

However, the response does not contain all the entities defined in the ExtractedValues. Have you ever faced this issue? I am wondering if you can help me with that. Thanks.

eyurtsev · 2024-03-27T00:43:47Z

What is it returning?

shima-khoshraftar · 2024-03-27T13:20:26Z

instead of returning three entities, it returns two for instance or one.

eyurtsev · 2024-03-27T14:15:25Z

Do you mean entity or attribute? Usually one would refer to the ExtractedValues as an entity, and its attributes (e.g., syndication_agent) as an attribute.

Are you observing that you're getting instances of extracted values with some attributes missing? (e.g., for syndication_agent?)

shima-khoshraftar · 2024-03-27T14:42:15Z

yes, some attributes (the way you define it) such as syndication_agent are missing sometime in the output. I'll put an example here shortly.

shima-khoshraftar · 2024-03-27T15:33:31Z

For instance, this is the ExtractedValues class:

class ExtractedValues(BaseModel):
agreement_date: str=Field(description='What is the agreement_date?')
agreement_name: str=Field(description='What is the agreement_name?')
governing_law: str=Field(description='What is the governing_law?')
effective_date: str=Field(description='What is the effective_date?')
termination_date: str=Field(description='What is the termination_date ?')

this is print(ExtractedValues.schema()):

{'properties': {'agreement_date': {'description': 'What is the agreement_date?', 'title': 'agreement_date', 'type': 'string'}, 'agreement_name': {'description': 'What is the agreement_name?', 'title': 'agreement_name', 'type': 'string'}, 'governing_law': {'description': 'What is the governing_law?', 'title': 'governing_law', 'type': 'string'}, 'effective_date': {'description': 'What is the effective_date?', 'title': 'effective_date', 'type': 'string'}, 'termination_date': {'description': 'What is the termination_date ?', 'title': 'termination_date', 'type': 'string'}}, 'required': ['agreement_date', 'agreement_name', 'governing_law', 'effective_date', 'termination_date'], 'title': 'ExtractedValues', 'type': 'object'}

this is response = runnable.invoke({"text": text, "schema": ExtractedValues.schema()})
print(response)

{'data': [{'agreement_date': 'September 30, 2020', 'agreement_name': 'Restated Credit Facility Agreement', 'governing_law': 'not specified'}]}

missing effective_date and termination_date in the response.

ccurme · 2024-03-27T20:07:30Z

Hi @shima-khoshraftar, thanks for flagging this. I'm having trouble reproducing the case where the attributes are not present at all. Do you think you could create a minimal example?

fwiw I am finding that OpenAI will occasionally return null for required fields-- e.g., a required integer, as in the below example:

from pydantic import BaseModel, Field

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI


openai_function_schema = {
  "type": "function",
  "function": {
    "name": "extractor",
    "description": "Extract information matching the given schema.",
    "parameters": {
      "type": "object",
      "properties": {
        "data": {
          "type": "array",
          "items": {
            "title": "Person",
            "type": "object",
            "properties": {
              "age": {
                "title": "Age",
                "description": "The age of the person in years.",
                "type": "integer"
              },
              "name": {
                "title": "Name",
                "description": "The name of the person.",
                "type": "string"
              }
            },
            "required": [
              "age",
              "name"
            ]
          }
        }
      },
      "required": [
        "data"
      ]
    }
  }
}

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Please extract."),
        ("human", "Extract from the following text: {text}"),
    ]
)

model = ChatOpenAI(temperature=0, model_kwargs={"tools": [openai_function_schema]})
(prompt | model).invoke({"text": "My name is Chester."}).additional_kwargs["tool_calls"][0]["function"]["arguments"]

'{"data":[{"age":null,"name":"Chester"}]}'

shima-khoshraftar · 2024-03-27T20:53:19Z

Hi @ccurme. Thanks for looking into this. So the issue is, it does not happen all the time as it is with llms. In fact, most of the time it works correctly. But because it can happen, it needs some checking (maybe I can myself do it as a post processing). Not sure if I even send an example you would face it too but I will send one. Thanks.
-Note: I am using Azure openAI (gpt-35-trubo).

eyurtsev · 2024-03-27T21:10:43Z

You can definitely do a post processing step on the client side using the pydantic schema that you defined! (check pydantic docs, but it shouldn't be too hard to validate client side).

You can likely mitigate some of the issues simply by providing examples. Examples tend to help a lot in improving performance!

ccurme self-assigned this Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not returning all the required entities defined in the schema #113

Not returning all the required entities defined in the schema #113

shima-khoshraftar commented Mar 27, 2024

eyurtsev commented Mar 27, 2024

shima-khoshraftar commented Mar 27, 2024

eyurtsev commented Mar 27, 2024

shima-khoshraftar commented Mar 27, 2024

shima-khoshraftar commented Mar 27, 2024

ccurme commented Mar 27, 2024

shima-khoshraftar commented Mar 27, 2024

eyurtsev commented Mar 27, 2024

Not returning all the required entities defined in the schema #113

Not returning all the required entities defined in the schema #113

Comments

shima-khoshraftar commented Mar 27, 2024

eyurtsev commented Mar 27, 2024

shima-khoshraftar commented Mar 27, 2024

eyurtsev commented Mar 27, 2024

shima-khoshraftar commented Mar 27, 2024

shima-khoshraftar commented Mar 27, 2024

ccurme commented Mar 27, 2024

shima-khoshraftar commented Mar 27, 2024

eyurtsev commented Mar 27, 2024