Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing Error with codeqai on Conda Environment: Continuous Indexing Without Completion #38

Open
TeomanEgeSelcuk opened this issue Mar 25, 2024 · 1 comment

Comments

@TeomanEgeSelcuk
Copy link

While using the codeqai tool within a conda environment, I encountered an issue during the indexing process where it continuously attempts to index without completion. This problem occurred when I tried to utilize codeqai's search functionality in my project directory. Specifically, the error IndexError: list index out of range was thrown, indicating an issue with handling the document vector indexing. Below are the detailed steps to reproduce, along with the specific environment setup.

Steps to Reproduce:

  1. Installed codeqai using pip within a conda environment.
  2. Ran codeqai configure and configured the tool with the following settings:
    • Selected "y" for using local embedding models.
    • Chose "Instructor-Large" for the local embedding model.
    • Selected "N" for using local chat models and chose "OpenAI" with "gpt-4" as the remote LLM.
  3. Attempted to start the codeqai search by navigating to my project directory (2-006) that includes .m, .mat, .txt. files. Running codeqai search in the terminal.
  4. Received a message indicating no vector store was found for 2-006 and that initial indexing may take a few minutes. Shortly after, the indexing process started but then failed with an IndexError: list index out of range.

Expected Behavior:

The indexing process should be completed, allowing for subsequent searches within the codebase using codeqai.

Actual Behavior:

The application failed to complete the indexing process due to an IndexError in the vector indexing step, specifically indicating a problem with handling the document vectors.

Environment:

  • codeqai version: 0.0.14
  • langchain-community version: 0.0.17
  • sentence-transformers version: 2.3.1
  • Python version: 3.11
  • Conda version: 4.12.0
  • Operating System: Windows (with Conda environment)

Full Terminal Output and Error

{GenericDirectory>}conda activate condaqai-env

(condaqai-env) {GenericDirectory>}codeqai search
Not a git repository. Exiting.

(condaqai-env) {GenericDirectory>}ls
'ls' is not recognized as an internal or external command,
operable program or batch file.

(condaqai-env) {GenericDirectory>}cd 2-006

(condaqai-env) {GenericDirectory}\2-006>codeqai search
No vector store found for 2-006. Initial indexing may take a few minutes.
⠋ 💾 Indexing vector store...Traceback (most recent call last):
  File "C:\Users\Edge\anaconda3\envs\condaqai-env\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Edge\anaconda3\envs\condaqai-env\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Edge\anaconda3\envs\condaqai-env\Scripts\codeqai.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "C:\Users\Edge\anaconda3\envs\condaqai-env\lib\site-packages\codeqai\__main__.py", line 5, in main
    app.run()
  File "C:\Users\Edge\anaconda3\envs\condaqai-env\lib\site-packages\codeqai\app.py", line 146, in run
    vector_store.index_documents(documents)
  File "C:\Users\Edge\anaconda3\envs\condaqai-env\lib\site-packages\codeqai\vector_store.py", line 34, in index_documents
    self.db = FAISS.from_documents(documents, self.embeddings)
  File "C:\Users\Edge\anaconda3\envs\condaqai-env\lib\site-packages\langchain_core\vectorstores.py", line 508, in from_documents
    return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
  File "C:\Users\Edge\anaconda3\envs\condaqai-env\lib\site-packages\langchain_community\vectorstores\faiss.py", line 960, in from_texts
    return cls.__from(
  File "C:\Users\Edge\anaconda3\envs\condaqai-env\lib\site-packages\langchain_community\vectorstores\faiss.py", line 919, in __from
    index = faiss.IndexFlatL2(len(embeddings[0]))
IndexError: list index out of range
⠴ 💾 Indexing vector store...

Additional Context:

This issue seems to stem from the vector indexing process within the langchain-community package, possibly due to an empty or malformed document set being processed for vectorization. Given the configuration steps and the use of a conda environment, there might be specific dependencies or configurations that contribute to this problem.

@fynnfluegge
Copy link
Owner

Thanks for that detailed report! I think the cause is probably an empty split set for a document, as you also mentioned already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants