Proposal: Support multithreading in vectorstore #21283

chosh0615 · 2024-05-03T23:11:12Z

chosh0615
May 3, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

I would like to propose multithreading when initializing a VectorStore or adding texts/documents to it.

Currently, sync and async methods of add_texts, add_documents, from_texts, and from_documents all processes texts sequentially. This does not fully utilize Embeddings API throughput and becomes bottleneck.

The following was my workaround to this problem. It splits documents into N groups and runs aadd_documents in parallel. This improves the entire embedding processing.

db: VectorStore = Chroma(embedding_function=UpstageEmbeddings())

async def embed_group(docs: list[Document]): 
  await db.aadd_documents(docs)

n = int(len(docs) / 10)
doc_groups = [docs[i:i + n] for i in range(0, len(docs), n)]

tasks = [embed_group(group) for group in doc_groups]
await asyncio.gather(*tasks)

I thought it would be a great feature if VectorStore supports something like this internally and users can use this with one liner.
One option is to support concurrency parameter in VectorStore which defaults to 1.

Chroma.afrom_documents(docs, concurrency=10)

I also noticed ContextThreadPoolExecutor already exists, so we can probably leverage that in VectorStore.

Let me know if there is already a better way to achieve this!

Motivation

Adding large number of chunks into VectorStore takes very long time currently and easily becomes a bottleneck. There is a workaround to this problem, but it is cumbersome to code out the concurrent processing.

Proposal (If applicable)

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Support multithreading in vectorstore #21283

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Proposal: Support multithreading in vectorstore #21283

chosh0615 May 3, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 0 comments

chosh0615
May 3, 2024