Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: extend PromptBuilder and deprecate DynamicPromptBuilder #7655

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

tstadel
Copy link
Member

@tstadel tstadel commented May 6, 2024

Related Issues

Currently we cannot have both:

  • a default prompt template defined (PromptBuilder)
  • dynamically change prompt templates at runtime (DynamicPromptBuilder)

There are two options:

  • A we extend DynamicPromptBuilder and leave PromptBuilder as is
  • B we extend PromptBuilder and deprecate DynamicPromptBuilder

Edit 07.05.: We decided to go with B

This is Option B
See #7652 for Option A

Proposed Changes:

This extends PromptBuilder to change prompts at query time.

default_template = "This is the default prompt: \\n Query: {{query}}"
prompt_builder = PromptBuilder(template=default_template)

pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)

# using the default prompt
result = pipe.run(
    data={
        "prompt_builder": {
            "query": "Where does the speaker live?",
        },
    }
)
#  "This is the default prompt: \n Query: Where does the speaker live?"

# using the dynamic prompt
result = pipe.run(
    data={
        "prompt_builder": {
            "template": "This is the dynamic prompt:\\n Query: {{query}}",
            "query": "Where does the speaker live?",
        },
    }
)
#  "This is the dynamic prompt: \n Query: Where does the speaker live?"

How did you test it?

  • added tests

Notes for the reviewer

Checklist

@github-actions github-actions bot added the type:documentation Improvements on the docs label May 7, 2024
@tstadel tstadel marked this pull request as ready for review May 7, 2024 12:56
@tstadel tstadel requested review from a team as code owners May 7, 2024 12:56
@tstadel tstadel requested review from dfokina and davidsbatista and removed request for a team May 7, 2024 12:56
@tstadel
Copy link
Member Author

tstadel commented May 7, 2024

We decided to go with this approach B.

@tstadel
Copy link
Member Author

tstadel commented May 7, 2024

I've removed all breaking changes. PromptBuilder should be have the same as before, extended by the dynamic template functionality.

@coveralls
Copy link
Collaborator

coveralls commented May 7, 2024

Pull Request Test Coverage Report for Build 9172968594

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 90.575%

Totals Coverage Status
Change from base Build 9129529675: 0.02%
Covered Lines: 6602
Relevant Lines: 7289

💛 - Coveralls

@tstadel
Copy link
Member Author

tstadel commented May 7, 2024

Chat counterpart is being implemented in #7663

@vblagoje
Copy link
Member

vblagoje commented May 9, 2024

@tstadel code is solid, my main concern is how to explain this to a user (cc @dfokina ) so that everything is clear and easily digested. Here is my proposal, see if it is easier for you to comprehend as well and adjust accordingly in the class pydocs and elsewhere in the documentation:

The PromptBuilder component provides a flexible way to generate prompts using Jinja2 templates. It can be used either standalone or as a part of a pipeline, allowing for both static and dynamic prompt generation.

Using PromptBuilder Standalone

You can use PromptBuilder with a static template provided at initialization or override it at runtime:

  1. Static template usage:
    Define a template at initialization and pass in relevant variables directly to the run method.

    from haystack.components.builders import PromptBuilder
    
    template = "Translate the following context to {{ target_language }}. Context: {{ snippet }}; Translation:"
    builder = PromptBuilder(template=template)
    result = builder.run(target_language="spanish", snippet="I can't speak spanish.")
    print(result)
    
    # Output:
    # {'prompt': "Translate the following context to spanish. Context: I can't speak spanish.; Translation:"}
  2. Dynamic template usage:
    Override the static template by providing a new template directly to the run method.

    template = "Translate the following context to {{ target_language }}. Context: {{ snippet }}; Translation:"
    builder = PromptBuilder(template=template)
    
    summary_template = "Translate to {{ target_language }} and summarize the following context. Context: {{ snippet }}; Summary:"
    result = builder.run(target_language="spanish", snippet="I can't speak spanish.", template=summary_template)
    print(result)
    
    # Output:
    # {'prompt': "Translate to spanish and summarize the following context. Context: I can't speak spanish.; Summary:"}

Using PromptBuilder in a pipeline

Static template in a pipeline:

When using a static template in a pipeline, define the variables argument during initialization to allow input slots for other components to pass data into PromptBuilder, as in, for example, variables=["documents"] in the PromptBuilder initialization below.

from typing import List
from haystack import Pipeline, component, Document
from haystack.utils import Secret
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

static_template = "Summarize the following context in {{ target_language }}: {{ documents[0].content }}"

prompt_builder = PromptBuilder(template=static_template, variables=["documents"])
llm = OpenAIGenerator(api_key=Secret.from_token("<your-api-key>"), model="gpt-3.5-turbo")

@component
class DocumentProducer:
    @component.output_types(documents=List[Document])
    def run(self, doc_input: str):
        return {"documents": [Document(content=doc_input)]}

pipe = Pipeline()
pipe.add_component("doc_producer", DocumentProducer())
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("doc_producer.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.prompt")

result = pipe.run(
    data={
        "doc_producer": {"doc_input": "This is a test document about Berlin."},
        "prompt_builder": {"template_variables": {"target_language": "Spanish"}},
    }
)
print(result)

# Output:
# {'llm': {'replies': ['Este es un documento de prueba sobre Berlín.'],
# 'meta': [{'model': 'gpt-3.5-turbo-0613',
# 'index': 0,
# 'finish_reason': 'stop',
# 'usage': {'prompt_tokens': 28,
# 'completion_tokens': 8,
# 'total_tokens': 36}}]}}

Note how in pipeline PromptBuilder usage, some of the prompt builder template variable values are coming from other components in the pipeline (e.g., documents coming from DocumentProducer) as well as being directly passed by the user to the pipeline run invocation via template_variables (e.g., target_language directly specified by user).

Dynamic template in a pipeline:

For dynamic template usage, we also define the variables argument during initialization to allow input slots for other components to pass data into PromptBuilder, as in, for example, variables=["documents"] in the PromptBuilder initialization below.

from typing import List
from haystack import Pipeline, component, Document
from haystack.utils import Secret
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

prompt_builder = PromptBuilder(variables=["documents"])
llm = OpenAIGenerator(api_key=Secret.from_token("<your-api-key>"), model="gpt-3.5-turbo")

@component
class DocumentProducer:
    @component.output_types(documents=List[Document])
    def run(self, doc_input: str):
        return {"documents": [Document(content=doc_input)]}

pipe = Pipeline()
pipe.add_component("doc_producer", DocumentProducer())
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("doc_producer.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.prompt")

dynamic_template = "Here is the document: {{documents[0].content}} \\n Answer: {{query}}"
result = pipe.run(
    data={
        "doc_producer": {"doc_input": "Hello world, I live in Berlin"},
        "prompt_builder": {
            "template": dynamic_template,
            "template_variables": {"query": "Where does the speaker live?"},
        },
    }
)
print(result)

# Output:
# {'llm': {'replies': ['The speaker lives in Berlin.'],
# 'meta': [{'model': 'gpt-3.5-turbo-0613',
# 'index': 0,
# 'finish_reason': 'stop',
# 'usage': {'prompt_tokens': 28,
# 'completion_tokens': 6,
# 'total_tokens': 34}}]}}

Note how dynamic PromptBuilder pipeline usage is very similar to static except that in pipeline run parameters, we pass a new template via the PromptBuilder's template parameter. Note however that this new template also has the documents prompt template value coming from other components. We cannot at runtime redefine input data slots coming from other components. Therefore, our new template also has a documents variable usage in it.

Important concepts to remember

  • Template variables vs. pipeline variables:

    • Template variables: Specified by the user directly via the template_variables argument to the run method.
    • Pipeline variables: Passed indirectly from other components through the pipeline graph and declared (their names only) via the variables argument during initialization.
  • Static vs dynamic templates:

    • Static templates are set during initialization.
    • Dynamic templates can override static templates at runtime through the run method's template parameter.
  • Variable precedence:

    • Variables provided by the user directly via template_variables take precedence over those coming from other components in the pipeline (kwargs).

@bilgeyucel
Copy link
Contributor

Based on the documentation provided, my questions are:

  • When using a static prompt, do we have to use the variables parameter with a PromptBuilder in a pipeline? If so, this is a breaking change
static_template = "Summarize the following context in {{ target_language }}: {{ documents[0].content }}"

prompt_builder = PromptBuilder(template=static_template, variables=["documents"])
  • Can we drop the "template_variables" key as we provide values for the variables? With the suggested version, this is a breaking change
result = pipe.run(
    data={
        "doc_producer": {"doc_input": "This is a test document about Berlin."},
        "prompt_builder": {"template_variables": {"target_language": "Spanish"}},
    }
)
  • Can we eliminate the variables parameter even in a dynamically prompted setting by compromising on the validation?

@vblagoje
Copy link
Member

vblagoje commented May 10, 2024

@bilgeyucel let me try to answer but I'll leave it to @tstadel to confirm.

  • don't have to use it - it is optional, no variables -> no other components providing data (e.g. documents) to PB.
  • It is an additional feature, to me it doesn't seem like breaking, @tstadel will confirm
  • We need to use variables whenever other components provide PromptBuilder with template variables data/values. Without it PB is not that usable in pipeline settings.

@tstadel
Copy link
Member Author

tstadel commented May 10, 2024

@bilgeyucel let me try to answer but I'll leave it to @tstadel to confirm.

  • don't have to use it - it is optional, no variables -> no other components providing data (e.g. documents) to PB.
  • It is an additional feature, to me it doesn't seem like breaking, @tstadel will confirm
  • We need to use variables whenever other components provide PromptBuilder with template variables data/values. Without it PB is not that usable in pipeline settings.

Almost:
If you pass template but not variables, input slots will be inferred from template as before (No breaking change!). So

  • you don't have to use variables at all if you are good with the input slots inferred from template
  • if you don't pass template, you have to pass variables in order to use input slots in dynamic templates, there is no other way to define them
  • template_variables is optional, you'll never be forced to define them

@TuanaCelik
Copy link
Member

Ok I think I understand what's going on. Let me explain and you tell me if I'm correct, followed by some thoughts:
What's going on:

  • Solution B that @tstadel suggests extends the PromptBuilder to do the following:
  • Basically, template becomes not only an initialization argument but also a runtime variable for PromptBuilder
  • When user 'overrides' template at .run(), they may also change prompt input variables (like document, query) - this is inferred? Can I just override it with whatever variable and run it?

What I am worried about:

  • If I am correct that @vblagoje you're suggesting that the changed variables for the templates are not inferred, we have to provide variables separately to the .run() correct?
  • This would be quite complex to explain to users imo. If there's any way to avoid making it so that variables of any kind have to be provided separately, I would suggest we do that.

Please educate me here though, maybe I'm misunderstanding something

@tstadel
Copy link
Member Author

tstadel commented May 10, 2024

  • If no other components can provide data otherwise, then the variables parameter becomes a must in most pipelines such as RAG
  • If I can eliminate "template_variables", and pass data={"prompt_builder": {"target_language": "Spanish"}} instead of data={"prompt_builder": {"template_variables": {"target_language": "Spanish"}}}, it's great. But the example code doesn't imply that.

Here's my understanding of how to use a static prompt with PromptBuilder in a pipeline. @tstadel please confirm 🙏

Before

The current implementation of a RAG pipeline:

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query}
})

After

With this PR, the updated pipeline will look like this:

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

prompt_builder = PromptBuilder(template=template, variables=["documents"])

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"template_variables": {"query": query}}
})

1 - I added variables=["documents"] to my PromptBuilder because I'll inject documents coming from the retriever 2 - I added "template_variables" key as I run the pipeline

Fortunately no :-)
It will work exactly as before.

@TuanaCelik
Copy link
Member

Additionally, looking at the code, I also see some inconsistencies that we shouldn't have if we must have variables..
In initialization, we optionally provide variables (my understanding, this is for when we override the template yes?)
But then, in the run function, we need to provide template_variables? Wouldn't these 2 be the same thing?

@TuanaCelik
Copy link
Member

Ok so:

  • I can use the PromptBuilder exactly the same as before without providing variables/template variables at all even if say a retriever is fowarding documents to it in pipeline.connect()
  • I will have to provide variables if I'm overriding template
  • One thing I just don't yet fully understand is when we would use template_variables vs variables and what the difference is (even if you say we don't need to use template_variables @tstadel - thanks for the explanations!!! Really helps

@vblagoje
Copy link
Member

No, we need variables so PB can accept data from other pipeline components. The other ones are provided by the user in run time

@tstadel
Copy link
Member Author

tstadel commented May 10, 2024

Ok I think I understand what's going on. Let me explain and you tell me if I'm correct, followed by some thoughts: What's going on:

  • Solution B that @tstadel suggests extends the PromptBuilder to do the following:
  • Basically, template becomes not only an initialization argument but also a runtime variable for PromptBuilder
  • When user 'overrides' template at .run(), they may also change prompt input variables (like document, query) - this is inferred? Can I just override it with whatever variable and run it?

What I am worried about:

  • If I am correct that @vblagoje you're suggesting that the changed variables for the templates are not inferred, we have to provide variables separately to the .run() correct?
  • This would be quite complex to explain to users imo. If there's any way to avoid making it so that variables of any kind have to be provided separately, I would suggest we do that.

Please educate me here though, maybe I'm misunderstanding something

@TuanaCelik @bilgeyucel @vblagoje
Ok here is an illustrative example that should help shed light on what's not obvious:

@bilgeyucel 's example

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query}
})

Here the following input slots are inferred from template:

  • documents
  • query

Now let's change template at runtime having the same variables:

fancy_template = """
This is a super fancy dynamic template:

Documents:
{% for document in documents %}
    Document {{ document.id }}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": fancy_template}
})

Then this will work seamlessly as we use the same input slots:

  • documents
  • query

Now there are two more cases for dynamic templates:
Case A)
We use less input slots as during init:

query_only_template = """
Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": query_only_template}
})

This will also work seamlessly as all template variables (i.e. query) are covered by input slots.

Case B)
We use more input slots as during init:

even_fancier_template = """
{{ header }}
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": even_fancier_template}
})

Note that the passed template now requires:

  • documents
  • query
  • header

The first two are covered by input slots, but the third header is not. That means there is no way to pass header through pipeline params. There are two options to set header now:

Case B1)
Set header via template_variables:

even_fancier_template = """
{{ header }}
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
header = "This is my header"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": even_fancier_template, "template_variables": {"header": header}}
})

Case B2)
Define header as input slot via variables at init:

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template, variables=["query", "documents", "header"]))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

even_fancier_template = """
{{ header }}
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
headers = "This is my header"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": even_fancier_template, "header": header}
})

Note, that variables are set to:

  • documents
  • query
  • header

Hence, we can pass header to prompt_builder via pipeline.

@tstadel
Copy link
Member Author

tstadel commented May 10, 2024

No, we need variables so PB can accept data from other pipeline components. The other ones are provided by the user in run time

@vblagoje please don't forget that variables are being inferred from template if template is set, but variables is not.

@tstadel
Copy link
Member Author

tstadel commented May 10, 2024

Additionally, looking at the code, I also see some inconsistencies that we shouldn't have if we must have variables.. In initialization, we optionally provide variables (my understanding, this is for when we override the template yes?) But then, in the run function, we need to provide template_variables? Wouldn't these 2 be the same thing?

@TuanaCelik
I wouldn't mix them up, as variables just define the variables that prompt builder instance expects to receive from the pipeline. template_variables on the other hand overwrite or extend pipeline provided variables by user defined values.
Maybe we can find a better name for template_variables here.

@tstadel
Copy link
Member Author

tstadel commented May 13, 2024

@vblagoje
The new documentation / explanation approach would look like this.
We start with
https://docs.haystack.deepset.ai/docs/promptbuilder and keep it the same*.
We add the following sections:

Changing the template at runtime (Prompt Engineering)

PromptBuilder allows you to switch the prompt template of an existing pipeline. Below's example builds on top of the existing pipeline of the previous section. The existing pipeline is invoked with a new prompt template:

documents = [
    Document(content="Joe lives in Berlin", meta={"name": "doc1"}), 
    Document(content="Joe is a software engineer", meta={"name": "doc1"}),
]
new_template = """
    You are a helpful assistant.
    Given these documents, answer the question.
    Documents:
    {% for doc in documents %}
        Document {{ loop.index }}:
        Document name: {{ doc.meta['name'] }}
        {{ doc.content }}
    {% endfor %}

    Question: {{ query }}
    Answer:
    """
p.run({
      "prompt_builder": {
          "documents": documents, 
          "query": question, 
          "template": new_template,
      },
  })

If you want to use different variables during prompt engineering than in the default template, you can do so by setting PromptBuilder's variables init parameter accordingly.

Overwriting variables at runtime

In case you want to overwrite the values of variables, you can use template_variables during runtime as illustrated below:

language_template = """
    You are a helpful assistant.
    Given these documents, answer the question.
    Documents:
    {% for doc in documents %}
        Document {{ loop.index }}:
        Document name: {{ doc.meta['name'] }}
        {{ doc.content }}
    {% endfor %}

    Question: {{ query }}
    Please provide your answer in {{ answer_language | default('English') }}
    Answer:
    """
p.run({
      "prompt_builder": {
          "documents": documents, 
          "query": question, 
          "template": language_template, 
          "template_variables": {"answer_language": "German"},
      },
  })

Note that language_template introduces variable answer_language which is not bound to any pipeline variable. If not set otherwise, it would evaluate to its default value 'English'. In this example we are overwriting its value to 'German'.
template_variables allows you to overwrite pipeline variables (such as documents) as well.

  • = except for the already broken examples

@dfokina
Copy link
Contributor

dfokina commented May 15, 2024

Hey @vblagoje @tstadel this last message with the docs suggestions looks reasonable to me, and the idea is pretty easy to understand :) We can adjust the examples slightly to fit into the docs, and it would look good.

@vblagoje
Copy link
Member

vblagoje commented May 15, 2024

I also very much like this user-perspective driven documentation rather than what I first suggested. And would even merge this straight into main. But let's proceed forward with what we all agree.

Shall we use the above written user perspective description in class pydocs as well @tstadel ?

@tstadel
Copy link
Member Author

tstadel commented May 16, 2024

I also very much like this user-perspective driven documentation rather than what I first suggested. And would even merge this straight into main. But let's proceed forward with what we all agree.

Shall we use the above written user perspective description in class pydocs as well @tstadel ?

@vblagoje Yes, why not. I can update it.

@tstadel
Copy link
Member Author

tstadel commented May 17, 2024

@vblagoje pydocs have been updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants