Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Evaluating correctness of my RAG solution #13063

Open
1 task done
nshern opened this issue Apr 23, 2024 · 1 comment
Open
1 task done

[Question]: Evaluating correctness of my RAG solution #13063

nshern opened this issue Apr 23, 2024 · 1 comment
Labels
question Further information is requested

Comments

@nshern
Copy link

nshern commented Apr 23, 2024

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

I am trying to use the built-in capabilities of llamaindex to evaluate the correctness of my RAG solution.
Looking at the example offered here

It seems like here we are evaluating against the base gpt-4 model, and not our rag solution.

So my question is, how do I evaluate the correctness in the answers of my rag solution?

@nshern nshern added the question Further information is requested label Apr 23, 2024
Copy link

dosubot bot commented Apr 23, 2024

To evaluate the correctness of your RAG solution with LlamaIndex, especially if it's multi-modal, follow these steps:

  1. Prepare Your Environment: Install necessary packages for multi-modal data handling:

    pip install llama-index-llms-openai llama-index-multi-modal-llms-openai llama-index-multi-modal-llms-replicate
  2. Set Up Your Multi-Modal RAG System: Initialize your system with MultiModalVectorStoreIndex and attach a generator, such as OpenAIMultiModal, with a PromptTemplate for querying.

    from llama_index.core.indices import MultiModalVectorStoreIndex
    from llama_index.multi_modal_llms.openai import OpenAIMultiModal
    from llama_index.core import PromptTemplate
    
    # Example setup
    asl_index = MultiModalVectorStoreIndex(image_nodes + text_nodes)
    openai_mm_llm = OpenAIMultiModal(model="gpt-4-vision-preview", max_new_tokens=300)
    qa_template = PromptTemplate(your_template_string)
    rag_engine = asl_index.as_query_engine(multi_modal_llm=openai_mm_llm, text_qa_template=qa_template)
  3. Generate and Evaluate Predictions: Use your RAG system to generate predictions, then evaluate them with evaluators like CorrectnessEvaluator, MultiModalRelevancyEvaluator, and MultiModalFaithfulnessEvaluator for assessing correctness, relevancy, and faithfulness.

    from llama_index.core.evaluation import CorrectnessEvaluator
    from llama_index.core.evaluation.multi_modal import MultiModalRelevancyEvaluator, MultiModalFaithfulnessEvaluator
    
    correctness_evaluator = CorrectnessEvaluator(llm=OpenAI(temperature=0, model="gpt-4"))
    relevancy_evaluator = MultiModalRelevancyEvaluator(multi_modal_llm=OpenAIMultiModal(model="gpt-4-vision-preview", max_new_tokens=300))
    faithfulness_evaluator = MultiModalFaithfulnessEvaluator(multi_modal_llm=OpenAIMultiModal(model="gpt-4-vision-preview", max_new_tokens=300))
  4. Analyze Results: Review the evaluation scores to understand your system's performance and identify improvement areas.

For detailed guidance, refer to the multi-modal RAG evaluation guide and the RAG evaluator base code in the LlamaIndex repository. This approach will help you effectively assess the correctness of your RAG solution's answers.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant