Skip to content

Commit

Permalink
Merge pull request #5 from theam/docs/refinements
Browse files Browse the repository at this point in the history
Refinements after initial discussions with the team
  • Loading branch information
juanjoman committed Aug 18, 2023
2 parents b4185cb + 7b0351d commit 7a118f6
Showing 1 changed file with 89 additions and 15 deletions.
104 changes: 89 additions & 15 deletions docs-site/docs/03_components/02_embeddings_space.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,26 @@ Leveraging the power of embeddings models, this component allows you to represen
[//]: # (TODO: Highlight this as a warning message in Docusaurus)
> **Warning**: The current version supports the [OpenAI embeddings model](https://platform.openai.com/docs/guides/embeddings) and [Pinecone](https://www.pinecone.io) for storage. Please provide the necessary credentials for these services. Note: these services may involve costs, always review their pricing details before use.
## `Embedding` object

In eLLMental embeddings are represented by the `Embedding` class, which has the following attributes:

- `id`: The unique identifier of the embedding.
- `embedding`: The numeric representation of the text.
- `metadata`: Additional information associated with the embedding.
- `createdAt`: The timestamp of the embedding creation.
- `modelId`: The identifier of the embeddings model used to generate the embedding.

```java
public class Embedding {
private final String id;
private final float[] embedding;
private final Map<String, String> metadata;
private final Instant createdAt;
private final String modelId;
}
```

The `EmbeddingsSpaceComponent` interface defines the following methods:

## Constructor
Expand All @@ -23,45 +43,74 @@ EmbeddingsStore pineconeStore = new PineconeEmbeddingsStore("YOUR_PINECONE_URL",
EmbeddingsSpaceComponent embeddingsSpace = new EmbeddingsSpaceComponent(openAIModel, pineconeStore);
```

## `create`
## `generate`

Generates an embedding from a text without persisting it.

- **Parameters**:
- `text`: The textual input for embedding.
- `additionalMetadata`: Supplementary metadata associated with the text.
- **Returns**: The generated embedding.

```java
String sampleText = "Hello, eLLMental!";
Map<String, String> additionalMetadata = new HashMap<>();
additionalMetadata.put("key", "value");
Embedding embedding = embeddingsSpace.create(sampleText, additionalMetadata);

Embedding embedding = embeddingsSpace.generate(sampleText, additionalMetadata);
```

## `add`

Generates and saves an embedding for a text.
Generates and persists an embedding for a given text.

- **Parameters**:
- `text`: Text to be embedded.
- `additionalMetadata`: Additional metadata.
- **Returns**: The generated embedding.

```java
Map<String, String> additionalMetadata = new HashMap<>();
additionalMetadata.put("key", "value");

String sampleText = "Hello, eLLMental!";
Embedding embedding = embeddingsSpace.add(sampleText, additionalMetadata);
```

## `mostSimilarEmbeddings`

Fetches embeddings semantically closest to a reference text.
Fetches embeddings semantically closest to a reference text or embedding.

With a reference text:

- **Parameters**:
- `referenceText`: The text for comparison.
- `limit`: Maximum number of similar embeddings to return.

```java
// First we add a few embeddings to the space
embeddingsSpace.add("Hello, eLLMental!");
embeddingsSpace.add("Hello, world!");

List<Embedding> closestNeighbors = embeddingsSpace.mostSimilarEmbeddings("Greetings!", 3);
// closestNeighbors will contain the embeddings for "Hello, eLLMental!" and "Hello, world!"
```

With a reference embedding:

- **Parameters**:
- `referenceEmbedding`: The embedding for comparison.
- `limit`: Maximum number of similar embeddings to return.

```java
// First we add a few embeddings to the space
embeddingsSpace.add("Hello, eLLMental!");
embeddingsSpace.add("Hello, world!");

Embedding embedding = embeddingsSpace.generate("Hello everyone!");

List<Embedding> closestNeighbors = embeddingsSpace.mostSimilarEmbeddings(embedding, 3);
// closestNeighbors will contain the embeddings for "Hello, eLLMental!" and "Hello, world!"
```

## `calculateRelationshipVector`
Expand All @@ -83,38 +132,63 @@ For instance, with the following list of text pairs:

The relationship vector for this group represents a translation in the embeddings space that, given a word that matches the ones in the left column, provides the location of a word that would likely appear in the right column for the given word. See the documentation for [`translateEmbedding`](#translateEmbedding) for more details.

The `RelationshipVector` class is defined as follows:

```java
public class RelationshipVector {
public final String label;
public final float[] vector;
}
```

And it can be calculated like this:

```java
String[][] textPairs = [["Man", "Woman"], ["Boy", "Girl"], ["King", "Queen"], ["Prince", "Princess"], ["Father", "Mother"]];
float[] relationshipVector = embeddingsSpace.calculateRelationshipVector(textPairs);
RelationshipVector relationshipVector = embeddingsSpace.calculateRelationshipVector(textPairs);
```

## `storeNamedRelationshipVector`

Stores a relationship vector in the embeddings store and assigns it a label for later use.

- **Parameters**:
- `label`: The label to assign to the relationship vector.
- `relationshipVector`: The relationship vector to store.


```java
// First we calculate a relationship vector
RelationshipVector relationshipVector = embeddingsSpace.calculateRelationshipVector(textPairs);

// Then we store it in the embeddings store for future use
embeddingsSpace.storeNamedRelationshipVector("feminize", relationshipVector);
```

## `translateEmbedding`

Shifts an embedding in the embeddings space to find the location of the text that would meet the relationship represented by the vector.
Shifts a reference text embedding in the embeddings space to find the location of the text that would meet the relationship represented by the vector.

- **Parameters**:
- `embedding`: The primary embedding.
- `referenceText`: The primary embedding.
- `vector`: The vector determining translation.

This is useful if you want to search for embeddings that are similar to a given one, but in a different context. For instance, let's say we have the following embedding:

```java
Embedding bullEmbedding = embeddingsSpace.create("Bull");
```

And a relationship vector calculated as follows:
And a relationship vector calculated with the `calculateRelationshipVector` method as follows:

```java
String[][] textPairs = [["Man", "Woman"], ["Boy", "Girl"], ["King", "Queen"], ["Prince", "Princess"], ["Father", "Mother"]];
float[] relationshipVector = embeddingsSpace.calculateRelationshipVector(textPairs);
RelationshipVector relationshipVector = embeddingsSpace.calculateRelationshipVector(textPairs);
```

We can use the relationship vector to find the location of the words that would be similar to "Cow" instead of "Bull". Notice that embeddings cannot be reversed, and we can't really know if this embedding represents a cow, but it will give us a good approximation that can be used to refine search results later.

```java
Embedding likelyACowEmbedding = embeddingsSpace.translateEmbedding(bullEmbedding, relationshipVector);
// This will create an estimated embedding of the word "Cow"
Embedding likelyACowEmbedding = embeddingsSpace.translateEmbedding("Bull", relationshipVector);

// This will return embeddings that are similar to "Cow"
// We use it as any other embedding to find stored texts that are similar to "Cow"
List<Embedding> similarToCowEmbeddings = embeddingsSpace.mostSimilarEmbeddings(likelyACowEmbedding, 5);
```

Expand Down

0 comments on commit 7a118f6

Please sign in to comment.