[Question]: vector_store.add(nodes) gives me error = opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError() #13110

Aekansh-Ak · 2024-04-25T16:56:44Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

I am trying to add nodes in vectorsearch vector database, I am following this-:

https://docs.llamaindex.ai/en/stable/examples/low_level/ingestion/

endpoint = f"https://admin:[email protected]:9200"
idx = "sample-index"
text_field = "text"
embedding_field = "vector_field"
client = OpensearchVectorClient(
endpoint, idx, dim=384, embedding_field=embedding_field, text_field=text_field, use_ssl = False, verify_certs = False
)

vector_store = OpensearchVectorStore(client)
#storage_context = StorageContext.from_defaults(vector_store=vector_store)

documents = SimpleDirectoryReader("/home/seceon/opensearch_lm_index/textdocs").load_data()

embeddings = HuggingFaceEmbedding()

splitter = SentenceSplitter(
chunk_size=700,
chunk_overlap=300,
)
nodes = splitter.get_nodes_from_documents(documents)

llm = HuggingFaceLLM(
model_name=model_id,
tokenizer_name=model_id,
context_window=3900,
max_new_tokens=256,
model_kwargs={"torch_dtype": torch.float16},
# tokenizer_kwargs={},
generate_kwargs={"temperature": 0.3, "top_p": 0.95},
device_map="auto",
)

Settings.llm = llm
Settings.embed_model = embeddings
Settings.context_window=4096,
Settings.num_output=1000,
Settings.chunk_overlap_ratio=0.3,
Settings.chunk_size_limit=None

for node in nodes:
node_embedding = embed_model.get_text_embedding(
node.get_content(metadata_mode="all")
)
node.embedding = node_embedding
vector_store.add(nodes)

#######################################################
Traceback (most recent call last):
File "/home/seceon/opensearch_lm_index/koshish.py", line 90, in
vector_store.add(nodes)
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 476, in add
return asyncio.get_event_loop().run_until_complete(
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 492, in async_add
await self._client.index_results(nodes)
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 347, in index_results
return await _bulk_ingest_embeddings(
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 110, in _bulk_ingest_embeddings
await client.indices.refresh(index=index_name)
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/client/indices.py", line 92, in refresh
return await self.transport.perform_request(
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 428, in perform_request
raise e
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 390, in perform_request
status, headers_response, data = await connection.perform_request(
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/http_aiohttp.py", line 329, in perform_request
raise ConnectionTimeout("TIMEOUT", str(e), e)
opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError()

dosubot · 2024-04-25T16:57:00Z

The ConnectionTimeout error you're encountering is likely due to one of several factors, including network or configuration issues with your OpenSearch server, the overhead of bulk ingestion operations, server performance and configuration, or the client's timeout settings being too low for the operation. Here's how you can address these issues:

Verify Server Endpoint and Accessibility: Ensure the OpenSearch server's endpoint is correct and that the server is accessible. Incorrect endpoint configuration or server unavailability can lead to timeout errors.
Check Network Connectivity: Ensure there's stable network connectivity between your client and the OpenSearch server.
Review Server Performance: The OpenSearch server's performance and configuration should be adequate to handle the workload. If the server is under heavy load, consider optimizing its resources.
Adjust Bulk Operation Settings: If you're ingesting large amounts of data, consider adjusting the max_chunk_bytes parameter to better suit your data size and reduce the load on the server.
Modify Client Timeout Settings: The timeout settings of the AsyncOpenSearch client may need to be increased to accommodate operations that take longer to complete, especially for large or complex ingestion tasks.

For more detailed troubleshooting, refer to the LlamaIndex repository, which might provide additional insights into the specific methods and configurations relevant to your issue.

If these steps do not resolve the issue, providing more details about your OpenSearch server configuration, the size of the data you're trying to ingest, and any specific network configurations could help in identifying a more precise solution.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Aekansh-Ak added the question Further information is requested label Apr 25, 2024

dosubot bot mentioned this issue May 8, 2024

[Bug]: OpenSearch ConnectionError(Timeout context manager should be used inside a task) #13358

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: vector_store.add(nodes) gives me error = opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError() #13110

[Question]: vector_store.add(nodes) gives me error = opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError() #13110

Aekansh-Ak commented Apr 25, 2024

dosubot bot commented Apr 25, 2024 •

edited

Details

[Question]: vector_store.add(nodes) gives me error = opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError() #13110

[Question]: vector_store.add(nodes) gives me error = opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError() #13110

Comments

Aekansh-Ak commented Apr 25, 2024

Question Validation

Question

dosubot bot commented Apr 25, 2024 • edited

Details

dosubot bot commented Apr 25, 2024 •

edited