Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: IngestionPipeline 0.9.36 pipline.arun to Redis Not implemented Error #10248

Closed
JoeSF49 opened this issue Jan 24, 2024 · 8 comments · Fixed by #12943
Closed

[Bug]: IngestionPipeline 0.9.36 pipline.arun to Redis Not implemented Error #10248

JoeSF49 opened this issue Jan 24, 2024 · 8 comments · Fixed by #12943
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@JoeSF49
Copy link

JoeSF49 commented Jan 24, 2024

Bug Description

using IngestionPipeline with docstore=RedisDocumentStore, then running pipline.arun to Redis Not implemented Error get Not implemented Error using llama-index 0.9.36. Downgrade to 0.9.30 allows me to run pipline.arun. just using simpleDirectoryRead load_data documents and passing it to run or arun.

Version

0.9.36 llama-index

Steps to Reproduce

install llama-index 0.9.36
create IngestionPipeline:
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(),
embed_model,
],
docstore=RedisDocumentStore.from_host_and_port(
"localhost", 6379, namespace="document_store"
),
vector_store=RedisVectorStore(
index_name="redis_vector_store",
index_prefix="vectore_store",
redis_url="redis://localhost:6379",
),
cache=IngestionCache(
cache=RedisCache.from_host_and_port("localhost", 6379),
collection="redis_cache",
),
docstore_strategy=DocstoreStrategy.UPSERTS,
)

attempt pipline.arun to a redis vectorstore
Error "Not implemented"

Workaround:
downgrade to llama-index 0.9.30 and both pipline.arun and pipline.run work

Relevant Logs/Tracbacks

No response

@JoeSF49 JoeSF49 added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jan 24, 2024
Copy link

dosubot bot commented Jan 24, 2024

🤖

Hello @JoeSF49!

I'm Dosu, an AI here to assist you with your inquiries and issues relating to the LlamaIndex repository. I'm here to help you navigate through bugs, answer your questions, and guide you in contributing to the project while we're waiting for a human maintainer.

I've taken notice of the issue you've described and I'm currently looking into it. I'll get back to you with a thorough answer shortly.

@logan-markewich
Copy link
Collaborator

Ah yea, need to implement the async methods here

Copy link

dosubot bot commented Jan 24, 2024

🤖

Thank you for reporting this issue. It seems like there's a problem with the implementation of the async methods in the Redis key-value store in the LlamaIndex version 0.9.36.

To better understand and address this issue, could you please provide the following additional information:

  1. The complete error message you're seeing when you attempt to run pipeline.arun.
  2. The version of Redis you're using.
  3. The operating system and its version on which you're running your LlamaIndex application.
  4. Any other relevant configuration details that might help us understand the issue better.

This information will help us in diagnosing the problem more accurately. Thank you for your cooperation.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@JoeSF49
Copy link
Author

JoeSF49 commented Jan 24, 2024

i believe it has to do with Docstore strategy in the pipeling. i also traced it to start failing at 0.9.33. it's works up to llama-index 0.9.32

ERROR:

---------------------------------------------------------------------------

NotImplementedError                       Traceback (most recent call last)

Cell In[18], line 2

      1 ## Async

----> 2 nodes = await pipeline.arun(documents=documents,show_progress=True)

      3 # nodes = pipeline.run(documents=documents,show_progress=True)

      4 print(f"Ingested {len(nodes)} Nodes")

 

File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/ingestion/pipeline.py:570, in IngestionPipeline.arun(self, show_progress, documents, nodes, cache_collection, in_place, store_doc_text, num_workers, **kwargs)

    565 if self.docstore is not None and self.vector_store is not None:

    566     if self.docstore_strategy in (

    567         DocstoreStrategy.UPSERTS,

    568         DocstoreStrategy.UPSERTS_AND_DELETE,

    569     ):

--> 570         nodes_to_run = await self._ahandle_upserts(

    571             input_nodes, store_doc_text=store_doc_text

    572         )

    573     elif self.docstore_strategy == DocstoreStrategy.DUPLICATES_ONLY:

    574         nodes_to_run = await self._ahandle_duplicates(

    575             input_nodes, store_doc_text=store_doc_text

    576         )

 

File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/ingestion/pipeline.py:513, in IngestionPipeline._ahandle_upserts(self, nodes, store_doc_text)

    509 """Handle docstore upserts by checking hashes and ids."""

    510 assert self.docstore is not None

    512 existing_doc_ids_before = set(

--> 513     (await self.docstore.aget_all_document_hashes()).values()

    514 )

    515 doc_ids_from_nodes = set()

    516 deduped_nodes_to_run = {}

 

File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/storage/docstore/keyval_docstore.py:524, in KVDocumentStore.aget_all_document_hashes(self)

    522 """Get the stored hash for all documents."""

    523 hashes = {}

--> 524 for doc_id in await self._kvstore.aget_all(

    525     collection=self._metadata_collection

    526 ):

    527     hash = await self.aget_document_hash(doc_id)

    528     if hash is not None:

 

File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/storage/kvstore/redis_kvstore.py:140, in RedisKVStore.aget_all(self, collection)

    138 async def aget_all(self, collection: str = DEFAULT_COLLECTION) -> Dict[str, dict]:

    139     """Get all values from the store."""

--> 140     raise NotImplementedError

 

NotImplementedError:

@JoeSF49
Copy link
Author

JoeSF49 commented Jan 24, 2024

As far as OSes: i use docker workloads on an ML hardware platform
Redis-stack docker image. redis version 7.2.3
Python env is basically ubuntu 22.04. It's really using a nvidia/cuda-12.x-ubuntu22.04 image. These has nvidia drivers and toolkits for hardware (GPU) passthrough. i install mambaforge env within this workload "OS" and libs. llama-index, langchain etc

@logan-markewich
Copy link
Collaborator

Yea as mentioned above, async redis docstore operations just need to be implemented

@JoeSF49
Copy link
Author

JoeSF49 commented Feb 1, 2024

@dosu-bot Hi and sorry I rarely make bug so i'm new to this process. Do I have to provided any more information or will this fix be implemented in upcoming releases? I cannot run the pipeline without the async as it fills the GPU memory during reg ".run". Thanks

Copy link

dosubot bot commented Feb 1, 2024

🤖

Thank you for the detailed report, JoeSF49. It seems like the issue is indeed related to the async methods not being implemented for the Redis docstore operations in the versions after 0.9.32.

To help us further investigate and address this issue, could you please provide the following additional information:

  1. The exact version of Python you are using.
  2. The specific versions of the dependencies you are using, especially for Redis and any related packages.
  3. Any custom configurations or settings you are using for Redis or LlamaIndex.
  4. If possible, a minimal code snippet that reproduces the error. This will help us to replicate the issue on our end.

This information will help us understand the context better and expedite the process of finding a solution. Thank you for your cooperation.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 2, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants