[Question]: Why is this llama-index query to Ollama (llama3) always timing out? #13106

komal-SkyNET · 2024-04-25T13:21:47Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

So far everything I've tried on llama-index has failed due to timeouts. In contrast, I run multiple projects in langchain where the response takes <10s against the same LLM - Ollama(llama3).

Here's my llama-index trace.

**********
Trace: query
    |_query -> 120.166777 seconds
      |_templating -> 7.7e-05 seconds
      |_llm -> 0.0 seconds
**********

Running on M1 MacBook Air

Code:

import nest_asyncio, llama_index, chromadb
nest_asyncio.apply()
import logging
import sys
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
llama_index.core.set_global_handler("simple")
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])


Settings.llm = Ollama(model="llama3", request_timeout=120)
Settings.chunk_overlap = 0
Settings.chunk_size = 128
Settings.callback_manager = callback_manager
# %%

lyft_index = VectorStoreIndex.from_documents(lyft_docs, embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"))
uber_index = VectorStoreIndex.from_documents(uber_docs, embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"))

# %%
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)


query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021"
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description=(
                "Provides information about Uber financials for year 2021"
            ),
        ),
    ),
]



s_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools
)

# %%
response = s_engine.query(
    "Compare and contrast the customer segments and geographies that grew the"
    " fastest"
)

The text was updated successfully, but these errors were encountered:

dosubot · 2024-04-25T13:22:04Z

The timeout issue you're experiencing with llama-index queries to Ollama (llama3) might not be solely due to the request_timeout setting, especially since you've already set it to 120 seconds, which is quite generous. Given that your queries work fine in langchain with less than 10 seconds response time against the same LLM, it suggests that the problem might lie elsewhere.

Considering the complexity of your query and the operations involved in processing it, there are a few areas you might want to investigate:

Query Complexity and Server Load: The complexity of your query or a high load on the Ollama server could be contributing to the timeouts. If possible, try simplifying your query or running it at a different time to see if the issue persists.
Batch Size and Embedding Model Efficiency: The VectorStoreIndex.from_documents method you're using is computationally intensive. Adjusting the batch size or opting for a more efficient embedding model could help. Specifically, using ONNX models through the OptimumEmbedding class for faster inference speeds might improve performance.
Network Latency: Although less likely, network latency between your machine and the Ollama server could also be a factor. This is harder to diagnose but worth considering if the above adjustments don't resolve the issue.

If after exploring these areas the problem remains, it might be worth reaching out to the Ollama support or community forums with specific details about your setup and the issues you're facing for more targeted assistance.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

logan-markewich · 2024-04-25T14:20:49Z

Set the request timeout larger, I usually do something like request_timeout=3000.0

komal-SkyNET added the question Further information is requested label Apr 25, 2024

komal-SkyNET mentioned this issue Apr 28, 2024

Align Ollama DEFAULT_CONTEXT_WINDOW to match with Ollama CLI default: 2048 #13139

Closed

19 tasks

130jd mentioned this issue May 1, 2024

[Question]: Anyone else encounter Ollama Starter Example working once, then failing thereafter? #13188

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Why is this llama-index query to Ollama (llama3) always timing out? #13106

[Question]: Why is this llama-index query to Ollama (llama3) always timing out? #13106

komal-SkyNET commented Apr 25, 2024 •

edited

dosubot bot commented Apr 25, 2024 •

edited

Details

logan-markewich commented Apr 25, 2024

[Question]: Why is this llama-index query to Ollama (llama3) always timing out? #13106

[Question]: Why is this llama-index query to Ollama (llama3) always timing out? #13106

Comments

komal-SkyNET commented Apr 25, 2024 • edited

Question Validation

Question

dosubot bot commented Apr 25, 2024 • edited

Details

logan-markewich commented Apr 25, 2024

komal-SkyNET commented Apr 25, 2024 •

edited

dosubot bot commented Apr 25, 2024 •

edited