New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: How do you correctly load MultiModalVectorStoreIndex from persist directory? #13102
Comments
It looks like the issue you're encountering with empty results from your
Make sure that the
|
@dosu that's exactly what I did, and for an index storing text it works good. However, in the case of this multimodal index I encounter problems. |
The issue with empty retrieval results when loading from a persist directory might be due to several nuanced factors related to the storage and loading process of your
Regarding the use of These steps are derived from practices in the LlamaIndex tests and should help diagnose and resolve the issue with empty retrieval results.
|
Hmm, I think it should be
|
@logan-markewich |
@logan-markewich bumping the problem up, for now I need to create the index from the start every time, it would be very convenient to be able to load it :) |
@emilia-szymanska one thing to note, if you use If I start the qdrant docker image, I can run this, and it works ok from llama_index.core import StorageContext, load_index_from_storage
from llama_index.core.indices import MultiModalVectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client
client = qdrant_client.QdrantClient(host="localhost", port=6333)
text_store = QdrantVectorStore(
"text_collection", client=client
)
image_store = QdrantVectorStore(
"image_collection", client=client
)
storage_context = StorageContext.from_defaults(
vector_store=text_store, image_store=image_store
)
index = MultiModalVectorStoreIndex.from_documents(
<documents>,
storage_context=storage_context
)
nodes = index.as_retriever().retrieve("test")
print(len(nodes))
index.storage_context.persist(persist_dir="./storage")
text_store = QdrantVectorStore(
"text_collection", client=client
)
image_store = QdrantVectorStore(
"image_collection", client=client
)
loaded_storage_context = StorageContext.from_defaults(
persist_dir="./storage", vector_store=text_store, image_store=image_store
)
index = load_index_from_storage(loaded_storage_context)
nodes = index.as_retriever().retrieve("test")
print(len(nodes)) |
@logan-markewich Oh, fair point. Is there a specific reason why in all MultiModalVectorStoreIndex-related tutorials Qdrant is used? Could I use something that does not require any Docker or API key (apart from Azure OpenAI key), but simply runs? Please consider that I am looking for a beginner's solution, the simpler the better :) Edit: I store only images, no text, so I cannot use from_vector_store method to recreate a VectorStoreIndex, since it "Cannot initialize from a vector store that does not store text.". |
I will include the solution that @logan-markewich suggested on discord. It works for me, so I am closing the issue! :) if persist_dir.is_dir():
storage_context: StorageContext = StorageContext.from_defaults(
persist_dir=persist_dir
)
image_index = load_index_from_storage(
storage_context, embed_model=self._embed_model
)
elif image_dir and image_dir.is_dir():
img_documents = load_image_documents(image_dir)
image_index = MultiModalVectorStoreIndex.from_documents(
img_documents,
embed_model=self._embed_model,
is_text_vector_store_empty=True,
show_progress=True,
)
if persist_dir:
image_index.storage_context.persist(persist_dir=persist_dir)
retriever= image_index.as_retriever(embed_model=self._embed_model)
img_nodes = retriever.retrieve(query) |
Question Validation
Question
I am using the following code to create a MultiModalVectorStoreIndex only from images:
When the persist directory does not exist, everything is okay and the retrieved img_nodes are not empty. However, when I run the code with the same query, but the persist directory exists (created from the previous run), img_nodes is always empty. What am I doing wrong? I went through the documentation and the issues and cannot figure out what I should change. I tried adding service context in load_index_from_storage, but it did not help.
The text was updated successfully, but these errors were encountered: