[Bug]: IngestionPipeline 0.9.36 pipline.arun to Redis Not implemented Error #10248

JoeSF49 · 2024-01-24T19:03:22Z

Bug Description

using IngestionPipeline with docstore=RedisDocumentStore, then running pipline.arun to Redis Not implemented Error get Not implemented Error using llama-index 0.9.36. Downgrade to 0.9.30 allows me to run pipline.arun. just using simpleDirectoryRead load_data documents and passing it to run or arun.

Version

0.9.36 llama-index

Steps to Reproduce

install llama-index 0.9.36
create IngestionPipeline:
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(),
embed_model,
],
docstore=RedisDocumentStore.from_host_and_port(
"localhost", 6379, namespace="document_store"
),
vector_store=RedisVectorStore(
index_name="redis_vector_store",
index_prefix="vectore_store",
redis_url="redis://localhost:6379",
),
cache=IngestionCache(
cache=RedisCache.from_host_and_port("localhost", 6379),
collection="redis_cache",
),
docstore_strategy=DocstoreStrategy.UPSERTS,
)

attempt pipline.arun to a redis vectorstore
Error "Not implemented"

Workaround:
downgrade to llama-index 0.9.30 and both pipline.arun and pipline.run work

Relevant Logs/Tracbacks

No response

dosubot · 2024-01-24T19:03:44Z

🤖

Hello @JoeSF49!

I'm Dosu, an AI here to assist you with your inquiries and issues relating to the LlamaIndex repository. I'm here to help you navigate through bugs, answer your questions, and guide you in contributing to the project while we're waiting for a human maintainer.

I've taken notice of the issue you've described and I'm currently looking into it. I'll get back to you with a thorough answer shortly.

logan-markewich · 2024-01-24T19:09:58Z

Ah yea, need to implement the async methods here

llama_index/llama_index/storage/kvstore/redis_kvstore.py

Line 118 in 773a2fd

async def aget(

dosubot · 2024-01-24T19:30:58Z

🤖

Thank you for reporting this issue. It seems like there's a problem with the implementation of the async methods in the Redis key-value store in the LlamaIndex version 0.9.36.

To better understand and address this issue, could you please provide the following additional information:

The complete error message you're seeing when you attempt to run pipeline.arun.
The version of Redis you're using.
The operating system and its version on which you're running your LlamaIndex application.
Any other relevant configuration details that might help us understand the issue better.

This information will help us in diagnosing the problem more accurately. Thank you for your cooperation.

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JoeSF49 · 2024-01-24T20:04:00Z

i believe it has to do with Docstore strategy in the pipeling. i also traced it to start failing at 0.9.33. it's works up to llama-index 0.9.32

ERROR:

---------------------------------------------------------------------------

NotImplementedError                       Traceback (most recent call last)

Cell In[18], line 2

      1 ## Async

----> 2 nodes = await pipeline.arun(documents=documents,show_progress=True)

      3 # nodes = pipeline.run(documents=documents,show_progress=True)

      4 print(f"Ingested {len(nodes)} Nodes")

 

File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/ingestion/pipeline.py:570, in IngestionPipeline.arun(self, show_progress, documents, nodes, cache_collection, in_place, store_doc_text, num_workers, **kwargs)

    565 if self.docstore is not None and self.vector_store is not None:

    566     if self.docstore_strategy in (

    567         DocstoreStrategy.UPSERTS,

    568         DocstoreStrategy.UPSERTS_AND_DELETE,

    569     ):

--> 570         nodes_to_run = await self._ahandle_upserts(

    571             input_nodes, store_doc_text=store_doc_text

    572         )

    573     elif self.docstore_strategy == DocstoreStrategy.DUPLICATES_ONLY:

    574         nodes_to_run = await self._ahandle_duplicates(

    575             input_nodes, store_doc_text=store_doc_text

    576         )

 

File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/ingestion/pipeline.py:513, in IngestionPipeline._ahandle_upserts(self, nodes, store_doc_text)

    509 """Handle docstore upserts by checking hashes and ids."""

    510 assert self.docstore is not None

    512 existing_doc_ids_before = set(

--> 513     (await self.docstore.aget_all_document_hashes()).values()

    514 )

    515 doc_ids_from_nodes = set()

    516 deduped_nodes_to_run = {}

 

File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/storage/docstore/keyval_docstore.py:524, in KVDocumentStore.aget_all_document_hashes(self)

    522 """Get the stored hash for all documents."""

    523 hashes = {}

--> 524 for doc_id in await self._kvstore.aget_all(

    525     collection=self._metadata_collection

    526 ):

    527     hash = await self.aget_document_hash(doc_id)

    528     if hash is not None:

 

File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/storage/kvstore/redis_kvstore.py:140, in RedisKVStore.aget_all(self, collection)

    138 async def aget_all(self, collection: str = DEFAULT_COLLECTION) -> Dict[str, dict]:

    139     """Get all values from the store."""

--> 140     raise NotImplementedError

 

NotImplementedError:

JoeSF49 · 2024-01-24T20:33:36Z

As far as OSes: i use docker workloads on an ML hardware platform
Redis-stack docker image. redis version 7.2.3
Python env is basically ubuntu 22.04. It's really using a nvidia/cuda-12.x-ubuntu22.04 image. These has nvidia drivers and toolkits for hardware (GPU) passthrough. i install mambaforge env within this workload "OS" and libs. llama-index, langchain etc

logan-markewich · 2024-01-24T22:41:42Z

Yea as mentioned above, async redis docstore operations just need to be implemented

JoeSF49 · 2024-02-01T15:45:40Z

@dosu-bot Hi and sorry I rarely make bug so i'm new to this process. Do I have to provided any more information or will this fix be implemented in upcoming releases? I cannot run the pipeline without the async as it fills the GPU memory during reg ".run". Thanks

dosubot · 2024-02-01T15:52:03Z

🤖

Thank you for the detailed report, JoeSF49. It seems like the issue is indeed related to the async methods not being implemented for the Redis docstore operations in the versions after 0.9.32.

To help us further investigate and address this issue, could you please provide the following additional information:

The exact version of Python you are using.
The specific versions of the dependencies you are using, especially for Redis and any related packages.
Any custom configurations or settings you are using for Redis or LlamaIndex.
If possible, a minimal code snippet that reproduces the error. This will help us to replicate the issue on our end.

This information will help us understand the context better and expedite the process of finding a solution. Thank you for your cooperation.

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JoeSF49 added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jan 24, 2024

philipchung mentioned this issue Apr 18, 2024

Implement async methods in RedisKVStore #12943

Merged

15 tasks

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 2, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 9, 2024

logan-markewich closed this as completed in #12943 May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: IngestionPipeline 0.9.36 pipline.arun to Redis Not implemented Error #10248

[Bug]: IngestionPipeline 0.9.36 pipline.arun to Redis Not implemented Error #10248

JoeSF49 commented Jan 24, 2024

dosubot bot commented Jan 24, 2024

logan-markewich commented Jan 24, 2024

dosubot bot commented Jan 24, 2024

JoeSF49 commented Jan 24, 2024 •

edited

JoeSF49 commented Jan 24, 2024

logan-markewich commented Jan 24, 2024

JoeSF49 commented Feb 1, 2024

dosubot bot commented Feb 1, 2024

[Bug]: IngestionPipeline 0.9.36 pipline.arun to Redis Not implemented Error #10248

[Bug]: IngestionPipeline 0.9.36 pipline.arun to Redis Not implemented Error #10248

Comments

JoeSF49 commented Jan 24, 2024

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Jan 24, 2024

logan-markewich commented Jan 24, 2024

dosubot bot commented Jan 24, 2024

JoeSF49 commented Jan 24, 2024 • edited

JoeSF49 commented Jan 24, 2024

logan-markewich commented Jan 24, 2024

JoeSF49 commented Feb 1, 2024

dosubot bot commented Feb 1, 2024

JoeSF49 commented Jan 24, 2024 •

edited