Integrate non-symbol documentation #26

emrgnt-cmplxty · 2023-06-21T14:32:11Z

Our current SymbolDocEmbeddingHandler class is an effective method for embedding source code documentation into symbols. However, there are instances in which relevant and important documentation does not correspond directly to a specific symbol. For instance, the contents of a README.md file or any other similar documents. These non-symbol documents often contain valuable contextual information that could be of significant importance in certain coding instances.

Problem:

We currently lack a strategy to include these non-symbol documents in our embedding and retrieval process. Not incorporating such important pieces of information can result in a less context-aware and informative system, which can potentially affect the quality of our coding assistance.

Tentative Solution:

Please note that the following proposed solution is preliminary and subject to revision.

One possible solution is to introduce a new class, let's call it NonSymbolDocEmbeddingHandler. This class can be similar in structure to SymbolDocEmbeddingHandler but would be specifically designed to handle non-symbol documents.

We can treat each non-symbol document as an individual entity that has its own embeddings. We can store these embeddings in the database just like we do for symbols. The document's name (or path) can serve as its identifier.

Here's a rough blueprint for the NonSymbolDocEmbeddingHandler class:

class NonSymbolDocEmbeddingHandler:

    def __init__(self, embedding_db: VectorDatabaseProvider, embedding_provider: EmbeddingProvider) -> None:
        self.embedding_db = embedding_db
        self.embedding_provider = embedding_provider

    def get_embedding(self, doc_name: str) -> np.ndarray:
        return self.embedding_db.get(doc_name)

    def update_embedding(self, doc_name: str, doc_content: str) -> None:
        if self.embedding_db.contains(doc_name):
            self.embedding_db.discard(doc_name)

        doc_embedding = self.embedding_provider.build_embedding(doc_content)
        self.embedding_db.add(doc_name, doc_embedding)

By incorporating such a system, we would be able to create and retrieve embeddings for non-symbol documents, integrating them into our current workflow.

Points to Consider:

A mechanism is needed to detect and handle updates in the non-symbol documents, similar to how we handle symbol source code changes.
A strategy for identifying when to consider non-symbol documentation in the retrieval process. Perhaps certain queries or contexts can trigger the consideration of these documents.
Performance implications: The addition of non-symbol documents could significantly increase the amount of data stored in the database. This might require optimizations or changes to how we store and retrieve embeddings.

Tasks:

Design and implement NonSymbolDocEmbeddingHandler.
Modify database interface to handle non-symbol document embeddings.
Implement mechanism to update non-symbol document embeddings when their contents change.
Test the new system to ensure it works as expected and does not introduce performance issues.

Your feedback and suggestions are welcome to help refine this preliminary solution.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

The text was updated successfully, but these errors were encountered:

Exarchias · 2023-07-14T18:20:06Z

Hi @emrgnt-cmplxty!
If no one shows interest for this issue, I am interested to start working on it on Monday.

Exarchias · 2023-07-17T16:52:59Z

Hello everyone!
I still like this issue, but unfortunately, I choose to proceed with the issue #28 :
#28
and I will avoid this one, (#26), for now.
If someone, wishes to take this issue, it is totally fine by me.

rename eval

…r-cleanup Feature/refactor cleanup

…r-tool-layout Refactor agent tool layout

emrgnt-cmplxty added a commit that referenced this issue Aug 28, 2023

Merge pull request #26 from emrgnt-cmplxty/feature/rename-eval

f1c4ec7

rename eval

Huntemall pushed a commit to Huntemall/automata-dev that referenced this issue Oct 30, 2023

Merge pull request emrgnt-cmplxty#26 from maks-ivanov/feature/refacto…

581c142

…r-cleanup Feature/refactor cleanup

Huntemall pushed a commit to Huntemall/automata-dev that referenced this issue Oct 30, 2023

Merge pull request emrgnt-cmplxty#26 from EmergentAGI/feature/refacto…

b4bebcc

…r-tool-layout Refactor agent tool layout

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate non-symbol documentation #26

Integrate non-symbol documentation #26

emrgnt-cmplxty commented Jun 21, 2023 •

edited

Exarchias commented Jul 14, 2023

Exarchias commented Jul 17, 2023

Integrate non-symbol documentation #26

Integrate non-symbol documentation #26

Comments

emrgnt-cmplxty commented Jun 21, 2023 • edited

Problem:

Tentative Solution:

Points to Consider:

Tasks:

Exarchias commented Jul 14, 2023

Exarchias commented Jul 17, 2023

emrgnt-cmplxty commented Jun 21, 2023 •

edited