Merge pull request #5 from theam/docs/refinements

Refinements after initial discussions with the team
theam · Aug 18, 2023 · 7a118f6 · 7a118f6
2 parents b4185cb + 7b0351d
commit 7a118f6
Showing 1 changed file with 89 additions and 15 deletions.
diff --git a/docs-site/docs/03_components/02_embeddings_space.md b/docs-site/docs/03_components/02_embeddings_space.md
@@ -11,6 +11,26 @@ Leveraging the power of embeddings models, this component allows you to represen
 [//]: # (TODO: Highlight this as a warning message in Docusaurus)
 > **Warning**: The current version supports the [OpenAI embeddings model](https://platform.openai.com/docs/guides/embeddings) and [Pinecone](https://www.pinecone.io) for storage. Please provide the necessary credentials for these services. Note: these services may involve costs, always review their pricing details before use.
 
+## `Embedding` object
+
+In eLLMental embeddings are represented by the `Embedding` class, which has the following attributes:
+
+- `id`: The unique identifier of the embedding.
+- `embedding`: The numeric representation of the text.
+- `metadata`: Additional information associated with the embedding.
+- `createdAt`: The timestamp of the embedding creation.
+- `modelId`: The identifier of the embeddings model used to generate the embedding.
+
+```java
+public class Embedding {
+ private final String id;
+ private final float[] embedding;
+ private final Map<String, String> metadata;
+ private final Instant createdAt;
+ private final String modelId;
+}
+```
+
 The `EmbeddingsSpaceComponent` interface defines the following methods:
 
 ## Constructor
@@ -23,45 +43,74 @@ EmbeddingsStore pineconeStore = new PineconeEmbeddingsStore("YOUR_PINECONE_URL",
 EmbeddingsSpaceComponent embeddingsSpace = new EmbeddingsSpaceComponent(openAIModel, pineconeStore);
 ```
 
-## `create`
+## `generate`
 
 Generates an embedding from a text without persisting it.
 
 - **Parameters**:
  - `text`: The textual input for embedding.
  - `additionalMetadata`: Supplementary metadata associated with the text.
+- **Returns**: The generated embedding.
 
 ```java
 String sampleText = "Hello, eLLMental!";
 Map<String, String> additionalMetadata = new HashMap<>();
 additionalMetadata.put("key", "value");
-Embedding embedding = embeddingsSpace.create(sampleText, additionalMetadata);
+
+Embedding embedding = embeddingsSpace.generate(sampleText, additionalMetadata);
 ```
 
 ## `add`
 
-Generates and saves an embedding for a text.
+Generates and persists an embedding for a given text.
 
 - **Parameters**:
  - `text`: Text to be embedded.
  - `additionalMetadata`: Additional metadata.
+- **Returns**: The generated embedding.
 
 ```java
+Map<String, String> additionalMetadata = new HashMap<>();
+additionalMetadata.put("key", "value");
+
 String sampleText = "Hello, eLLMental!";
  Embedding embedding = embeddingsSpace.add(sampleText, additionalMetadata);
 ```
 
 ## `mostSimilarEmbeddings`
 
-Fetches embeddings semantically closest to a reference text.
+Fetches embeddings semantically closest to a reference text or embedding.
+
+With a reference text:
 
 - **Parameters**:
  - `referenceText`: The text for comparison.
  - `limit`: Maximum number of similar embeddings to return.
 
 ```java
+// First we add a few embeddings to the space
 embeddingsSpace.add("Hello, eLLMental!");
+embeddingsSpace.add("Hello, world!");
+
 List<Embedding> closestNeighbors = embeddingsSpace.mostSimilarEmbeddings("Greetings!", 3);
+// closestNeighbors will contain the embeddings for "Hello, eLLMental!" and "Hello, world!"
+```
+
+With a reference embedding:
+
+- **Parameters**:
+ - `referenceEmbedding`: The embedding for comparison.
+ - `limit`: Maximum number of similar embeddings to return.
+
+```java
+// First we add a few embeddings to the space
+embeddingsSpace.add("Hello, eLLMental!");
+embeddingsSpace.add("Hello, world!");
+
+Embedding embedding = embeddingsSpace.generate("Hello everyone!");
+
+List<Embedding> closestNeighbors = embeddingsSpace.mostSimilarEmbeddings(embedding, 3);
+// closestNeighbors will contain the embeddings for "Hello, eLLMental!" and "Hello, world!"
 ```
 
 ## `calculateRelationshipVector`
@@ -83,38 +132,63 @@ For instance, with the following list of text pairs:
 
 The relationship vector for this group represents a translation in the embeddings space that, given a word that matches the ones in the left column, provides the location of a word that would likely appear in the right column for the given word. See the documentation for [`translateEmbedding`](#translateEmbedding) for more details.
 
+The `RelationshipVector` class is defined as follows:
+
+```java
+public class RelationshipVector {
+ public final String label;
+ public final float[] vector;
+}
+```
+
+And it can be calculated like this:
+
 ```java
 String[][] textPairs = [["Man", "Woman"], ["Boy", "Girl"], ["King", "Queen"], ["Prince", "Princess"], ["Father", "Mother"]];
-float[] relationshipVector = embeddingsSpace.calculateRelationshipVector(textPairs);
+RelationshipVector relationshipVector = embeddingsSpace.calculateRelationshipVector(textPairs);
+```
+
+## `storeNamedRelationshipVector`
+
+Stores a relationship vector in the embeddings store and assigns it a label for later use.
+
+- **Parameters**:
+ - `label`: The label to assign to the relationship vector.
+ - `relationshipVector`: The relationship vector to store.
+
+
+```java
+// First we calculate a relationship vector
+RelationshipVector relationshipVector = embeddingsSpace.calculateRelationshipVector(textPairs);
+
+// Then we store it in the embeddings store for future use
+embeddingsSpace.storeNamedRelationshipVector("feminize", relationshipVector);
 ```
 
 ## `translateEmbedding`
 
-Shifts an embedding in the embeddings space to find the location of the text that would meet the relationship represented by the vector.
+Shifts a reference text embedding in the embeddings space to find the location of the text that would meet the relationship represented by the vector.
 
 - **Parameters**:
- - `embedding`: The primary embedding.
+ - `referenceText`: The primary embedding.
  - `vector`: The vector determining translation.
 
 This is useful if you want to search for embeddings that are similar to a given one, but in a different context. For instance, let's say we have the following embedding:
 
-```java
-Embedding bullEmbedding = embeddingsSpace.create("Bull");
-```
-
-And a relationship vector calculated as follows:
+And a relationship vector calculated with the `calculateRelationshipVector` method as follows:
 
 ```java
 String[][] textPairs = [["Man", "Woman"], ["Boy", "Girl"], ["King", "Queen"], ["Prince", "Princess"], ["Father", "Mother"]];
-float[] relationshipVector = embeddingsSpace.calculateRelationshipVector(textPairs);
+RelationshipVector relationshipVector = embeddingsSpace.calculateRelationshipVector(textPairs);
 ```
 
 We can use the relationship vector to find the location of the words that would be similar to "Cow" instead of "Bull". Notice that embeddings cannot be reversed, and we can't really know if this embedding represents a cow, but it will give us a good approximation that can be used to refine search results later.
 
 ```java
-Embedding likelyACowEmbedding = embeddingsSpace.translateEmbedding(bullEmbedding, relationshipVector);
+// This will create an estimated embedding of the word "Cow"
+Embedding likelyACowEmbedding = embeddingsSpace.translateEmbedding("Bull", relationshipVector);
 
-// This will return embeddings that are similar to "Cow"
+// We use it as any other embedding to find stored texts that are similar to "Cow"
 List<Embedding> similarToCowEmbeddings = embeddingsSpace.mostSimilarEmbeddings(likelyACowEmbedding, 5);
 ```