Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Memory / sizing issues with KnowledgeGraphIndex and embedding #13104

Open
1 task done
stdweird opened this issue Apr 25, 2024 · 3 comments
Open
1 task done
Labels
question Further information is requested

Comments

@stdweird
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

We are trying to build a KnowledgeGraphIndex with include_embeddings=True. Current setup uses a postgres backed KV store as index store.
We are being hit by sizing in 2 places more or less at the same time from the same source: the index_struct.embedding_dict.

First issue is internal and most likely manageable (we are working on a patch): llama_index uses List[float] to store embeddings, however, these things use approx 32byte per float. We have for testing ~10k nodes to index, resulting in 40-50k triplets, and with eg openai ada2 embedding of dim 1500, this grows out of control. Fix here is to use numpy arrays with float32 or even float16 (which should be ok for embeddings), reducing the internal size by at least factor 8. This will be manageable, but not ideal (if we use float16, this might be ok). Additional advantage might be that also the regular nodes with embeddings will use less space if this was also used there. (our testbed has lots of memory, so this is more of an operational issue: in production, we don't want to give chatbots 10s of GB of ram, solely to store some embedding data; but again float16 to the rescue).

However, even if we address the internal memory issue, problem nr 2 is the store: the embedding_dict is now stored as part of a single index_struct in the index store; so it serialises a very large object to json, and sends it to postgres. postgres has a row limit of 1GB (we discovered; i still need to try the jsonb 256MB limit); but this is not enough. we are a bit in the dark what to do with this one. can one shard the index_struct somehow? do we store the embeddings_dict in it's own embeddings kvstore (i think we only need a get and put)? the code also mentions "TBD, should support vector store", but that is beyond my patching skills ;)

@logan-markewich you mentioned you are working on new graphindex code, but any ideas what we could try in the meantime?

@stdweird stdweird added the question Further information is requested label Apr 25, 2024
Copy link

dosubot bot commented Apr 25, 2024

To address the memory and sizing issues with the KnowledgeGraphIndex when include_embeddings=True, especially regarding the large size of index_struct.embedding_dict and its storage in a PostgreSQL database, consider the following strategies:

  1. Optimize Embedding Storage: Transition from using List[float] to numpy arrays with float32 or float16 data types for storing embeddings. This change can significantly reduce memory usage by at least a factor of 8. Using float16 can be particularly beneficial for embeddings, offering a good balance between precision and storage requirements.

  2. Sharding the Index Structure: Implement sharding for the index_struct to distribute its storage and processing. This approach can help manage large datasets by breaking them down into smaller, more manageable parts, each stored and processed independently.

  3. Separate Storage for Embeddings: Consider storing the embeddings_dict in a separate, specialized storage solution optimized for large numerical data. This can alleviate the load on the primary database and enhance performance.

  4. Database Optimization and Compression: Apply compression techniques to the embeddings before storage and optimize the database schema. This includes using efficient data types, optimizing queries, and indexing critical columns to improve storage and retrieval efficiency.

  5. Lazy Loading and In-memory Caching: Implement lazy loading to load embeddings into memory only when needed, and use in-memory caching for frequently accessed embeddings to reduce retrieval times and database load.

Regarding the serialization of index_struct.embedding_dict for PostgreSQL storage, the index_struct_to_json function in LlamaIndex demonstrates how IndexStruct objects, including those containing embedding_dict, are serialized. This serialized form is suitable for storage in PostgreSQL, leveraging its JSON support for efficient storage and querying.

For reducing the precision of embeddings from float32 to float16, be aware that while this can lead to faster computation and lower memory usage, it may also result in a decrease in the accuracy of the embeddings due to the reduced precision.

The LlamaIndex codebase includes mechanisms for lazy loading of embeddings, allowing embeddings to be loaded on-demand. This approach provides a flexible and extensible mechanism for managing various embedding types within the framework.

By implementing these strategies, you can better manage the memory and sizing issues associated with storing and processing large embedding dictionaries in a KnowledgeGraphIndex, especially when backed by a PostgreSQL database.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@logan-markewich
Copy link
Collaborator

@stdweird the real solution here is not store the embeddings in memory, and instead use an actual vector db (although this would take some refactor work)

This would separate out the embeddings from the hacky location they are in now

@logan-markewich
Copy link
Collaborator

logan-markewich commented Apr 25, 2024

The storage context already have a vector store, so technically it should be storing and retrieving vectors from that. Then, you could integrate any vector store (qdrant, chroma, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants