Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: vector_store.add(nodes) gives me error = opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError() #13110

Open
1 task done
Aekansh-Ak opened this issue Apr 25, 2024 · 1 comment
Labels
question Further information is requested

Comments

@Aekansh-Ak
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

I am trying to add nodes in vectorsearch vector database, I am following this-:

https://docs.llamaindex.ai/en/stable/examples/low_level/ingestion/

endpoint = f"https://admin:[email protected]:9200"
idx = "sample-index"
text_field = "text"
embedding_field = "vector_field"
client = OpensearchVectorClient(
endpoint, idx, dim=384, embedding_field=embedding_field, text_field=text_field, use_ssl = False, verify_certs = False
)

vector_store = OpensearchVectorStore(client)
#storage_context = StorageContext.from_defaults(vector_store=vector_store)

documents = SimpleDirectoryReader("/home/seceon/opensearch_lm_index/textdocs").load_data()

embeddings = HuggingFaceEmbedding()

splitter = SentenceSplitter(
chunk_size=700,
chunk_overlap=300,
)
nodes = splitter.get_nodes_from_documents(documents)

llm = HuggingFaceLLM(
model_name=model_id,
tokenizer_name=model_id,
context_window=3900,
max_new_tokens=256,
model_kwargs={"torch_dtype": torch.float16},
# tokenizer_kwargs={},
generate_kwargs={"temperature": 0.3, "top_p": 0.95},
device_map="auto",
)

Settings.llm = llm
Settings.embed_model = embeddings
Settings.context_window=4096,
Settings.num_output=1000,
Settings.chunk_overlap_ratio=0.3,
Settings.chunk_size_limit=None

for node in nodes:
node_embedding = embed_model.get_text_embedding(
node.get_content(metadata_mode="all")
)
node.embedding = node_embedding
vector_store.add(nodes)

#######################################################
Traceback (most recent call last):
File "/home/seceon/opensearch_lm_index/koshish.py", line 90, in
vector_store.add(nodes)
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 476, in add
return asyncio.get_event_loop().run_until_complete(
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 492, in async_add
await self._client.index_results(nodes)
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 347, in index_results
return await _bulk_ingest_embeddings(
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 110, in _bulk_ingest_embeddings
await client.indices.refresh(index=index_name)
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/client/indices.py", line 92, in refresh
return await self.transport.perform_request(
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 428, in perform_request
raise e
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 390, in perform_request
status, headers_response, data = await connection.perform_request(
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/http_aiohttp.py", line 329, in perform_request
raise ConnectionTimeout("TIMEOUT", str(e), e)
opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError()

@Aekansh-Ak Aekansh-Ak added the question Further information is requested label Apr 25, 2024
Copy link

dosubot bot commented Apr 25, 2024

The ConnectionTimeout error you're encountering is likely due to one of several factors, including network or configuration issues with your OpenSearch server, the overhead of bulk ingestion operations, server performance and configuration, or the client's timeout settings being too low for the operation. Here's how you can address these issues:

  1. Verify Server Endpoint and Accessibility: Ensure the OpenSearch server's endpoint is correct and that the server is accessible. Incorrect endpoint configuration or server unavailability can lead to timeout errors.

  2. Check Network Connectivity: Ensure there's stable network connectivity between your client and the OpenSearch server.

  3. Review Server Performance: The OpenSearch server's performance and configuration should be adequate to handle the workload. If the server is under heavy load, consider optimizing its resources.

  4. Adjust Bulk Operation Settings: If you're ingesting large amounts of data, consider adjusting the max_chunk_bytes parameter to better suit your data size and reduce the load on the server.

  5. Modify Client Timeout Settings: The timeout settings of the AsyncOpenSearch client may need to be increased to accommodate operations that take longer to complete, especially for large or complex ingestion tasks.

For more detailed troubleshooting, refer to the LlamaIndex repository, which might provide additional insights into the specific methods and configurations relevant to your issue.

If these steps do not resolve the issue, providing more details about your OpenSearch server configuration, the size of the data you're trying to ingest, and any specific network configurations could help in identifying a more precise solution.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant