[Question]: Evaluating correctness of my RAG solution #13063

nshern · 2024-04-23T21:56:22Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

I am trying to use the built-in capabilities of llamaindex to evaluate the correctness of my RAG solution.
Looking at the example offered here

It seems like here we are evaluating against the base gpt-4 model, and not our rag solution.

So my question is, how do I evaluate the correctness in the answers of my rag solution?

dosubot · 2024-04-23T21:56:37Z

To evaluate the correctness of your RAG solution with LlamaIndex, especially if it's multi-modal, follow these steps:

Prepare Your Environment: Install necessary packages for multi-modal data handling:

pip install llama-index-llms-openai llama-index-multi-modal-llms-openai llama-index-multi-modal-llms-replicate

Set Up Your Multi-Modal RAG System: Initialize your system with MultiModalVectorStoreIndex and attach a generator, such as OpenAIMultiModal, with a PromptTemplate for querying.

from llama_index.core.indices import MultiModalVectorStoreIndex
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core import PromptTemplate

# Example setup
asl_index = MultiModalVectorStoreIndex(image_nodes + text_nodes)
openai_mm_llm = OpenAIMultiModal(model="gpt-4-vision-preview", max_new_tokens=300)
qa_template = PromptTemplate(your_template_string)
rag_engine = asl_index.as_query_engine(multi_modal_llm=openai_mm_llm, text_qa_template=qa_template)

Generate and Evaluate Predictions: Use your RAG system to generate predictions, then evaluate them with evaluators like CorrectnessEvaluator, MultiModalRelevancyEvaluator, and MultiModalFaithfulnessEvaluator for assessing correctness, relevancy, and faithfulness.

from llama_index.core.evaluation import CorrectnessEvaluator
from llama_index.core.evaluation.multi_modal import MultiModalRelevancyEvaluator, MultiModalFaithfulnessEvaluator

correctness_evaluator = CorrectnessEvaluator(llm=OpenAI(temperature=0, model="gpt-4"))
relevancy_evaluator = MultiModalRelevancyEvaluator(multi_modal_llm=OpenAIMultiModal(model="gpt-4-vision-preview", max_new_tokens=300))
faithfulness_evaluator = MultiModalFaithfulnessEvaluator(multi_modal_llm=OpenAIMultiModal(model="gpt-4-vision-preview", max_new_tokens=300))

Analyze Results: Review the evaluation scores to understand your system's performance and identify improvement areas.

For detailed guidance, refer to the multi-modal RAG evaluation guide and the RAG evaluator base code in the LlamaIndex repository. This approach will help you effectively assess the correctness of your RAG solution's answers.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

nshern added the question Further information is requested label Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Evaluating correctness of my RAG solution #13063

[Question]: Evaluating correctness of my RAG solution #13063

nshern commented Apr 23, 2024

dosubot bot commented Apr 23, 2024 •

edited

Details

[Question]: Evaluating correctness of my RAG solution #13063

[Question]: Evaluating correctness of my RAG solution #13063

Comments

nshern commented Apr 23, 2024

Question Validation

Question

dosubot bot commented Apr 23, 2024 • edited

Details

dosubot bot commented Apr 23, 2024 •

edited