Skip to content

Commit

Permalink
Add support for Dimension parameter for embeddings (#144)
Browse files Browse the repository at this point in the history
  • Loading branch information
marcominerva committed Jan 29, 2024
2 parents fcd246c + 1e911dd commit bb87861
Show file tree
Hide file tree
Showing 30 changed files with 267 additions and 39 deletions.
52 changes: 50 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ We can also set ChatGPT parameters for chat completion at startup. Check the [of

The configuration can be automatically read from [IConfiguration](https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.configuration.iconfiguration), using for example a _ChatGPT_ section in the _appsettings.json_ file:

```yaml
```
"ChatGPT": {
"Provider": "OpenAI", // Optional. Allowed values: OpenAI (default) or Azure
"ApiKey": "", // Required
Expand All @@ -159,6 +159,9 @@ The configuration can be automatically read from [IConfiguration](https://learn.
// "FrequencyPenalty": 0,
// "ResponseFormat": { "Type": "text" }, // Allowed values for Type: text (default) or json_object
// "Seed": 42 // Optional (any integer value)
//},
//"DefaultEmbeddingParameters": {
// "Dimensions": 1536
//}
}
```
Expand Down Expand Up @@ -550,7 +553,52 @@ var response = await chatGptClient.GenerateEmbeddingAsync(message);
var embeddings = response.GetEmbedding();
```

This code will give you a float array containing all the embeddings for the specified message. The length of the array depends on the model used. For example, if we use the _text-embedding-ada-002_ model, the array will contain 1536 elements.
This code will give you a float array containing all the embeddings for the specified message. The length of the array depends on the model used:

| Model| Output dimension |
| - | - |
| text-embedding-ada-002 | 1536 |
| text-embedding-3-small | 1536 |
| text-embedding-3-large | 3072 |

Newer models like _text-embedding-3-small_ and _text-embedding-3-large_ allows developers to trade-off performance and cost of using embeddings. Specifically, developers can shorten embeddings without the embedding losing its concept-representing properties.

As for ChatGPT, this settings can be done in various ways:

- Via code:

```csharp
builder.Services.AddChatGpt(options =>
{
// ...
options.DefaultEmbeddingParameters = new EmbeddingParameters
{
Dimensions = 256
};
});
```

- Using the _appsettings.json_ file:

```
"ChatGPT": {
"DefaultEmbeddingParameters": {
"Dimensions": 256
}
}
```

Then, if you want to change the dimension for a particular request, you can specify the *EmbeddingParameters* argument in the **GetEmbeddingAsync** invocation:

```csharp
var response = await chatGptClient.GenerateEmbeddingAsync(request.Message, new EmbeddingParameters
{
Dimensions = 512
});

var embeddings = response.GetEmbedding(); // The length of the array is 512
```

If you need to calculate the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between two embeddings, you can use the **EmbeddingUtility.CosineSimilarity** method.

Expand Down
25 changes: 25 additions & 0 deletions docs/ChatGptNet.Models.Embeddings/EmbeddingParameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# EmbeddingParameters class

Represents embeddings parameters.

```csharp
public class EmbeddingParameters
```

## Public Members

| name | description |
| --- | --- |
| [EmbeddingParameters](EmbeddingParameters/EmbeddingParameters.md)() | The default constructor. |
| [Dimensions](EmbeddingParameters/Dimensions.md) { get; set; } | The number of dimensions the resulting output embeddings should have. Only supported in `text-embedding-3` and later models. |

## Remarks

See [Create embeddings](https://platform.openai.com/docs/api-reference/embeddings/create) for more information.
## See Also

* namespace [ChatGptNet.Models.Embeddings](../ChatGptNet.md)
* [EmbeddingParameters.cs](https://github.com/marcominerva/ChatGptNet/tree/master/src/ChatGptNet/Models/Embeddings/EmbeddingParameters.cs)
<!-- DO NOT EDIT: generated by xmldocmd for ChatGptNet.dll -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# EmbeddingParameters.Dimensions property

The number of dimensions the resulting output embeddings should have. Only supported in `text-embedding-3` and later models.

```csharp
public int? Dimensions { get; set; }
```

## See Also

* class [EmbeddingParameters](../EmbeddingParameters.md)
* namespace [ChatGptNet.Models.Embeddings](../../ChatGptNet.md)

<!-- DO NOT EDIT: generated by xmldocmd for ChatGptNet.dll -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# EmbeddingParameters constructor

The default constructor.

```csharp
public EmbeddingParameters()
```

## See Also

* class [EmbeddingParameters](../EmbeddingParameters.md)
* namespace [ChatGptNet.Models.Embeddings](../../ChatGptNet.md)

<!-- DO NOT EDIT: generated by xmldocmd for ChatGptNet.dll -->
4 changes: 3 additions & 1 deletion docs/ChatGptNet.Models.Embeddings/OpenAIEmbeddingModels.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ public static class OpenAIEmbeddingModels

| name | description |
| --- | --- |
| const [TextEmbeddingAda002](OpenAIEmbeddingModels/TextEmbeddingAda002.md) | The second generation embedding model provided by OpenAI. |
| const [TextEmbedding3Large](OpenAIEmbeddingModels/TextEmbedding3Large.md) | Most capable embedding model for both english and non-english tasks. It uses a 3072 output dimension. |
| const [TextEmbedding3Small](OpenAIEmbeddingModels/TextEmbedding3Small.md) | Increased performance over 2nd generation ada embedding model. It uses a 1536 output dimension. |
| const [TextEmbeddingAda002](OpenAIEmbeddingModels/TextEmbeddingAda002.md) | The second generation embedding model provided by OpenAI. It uses a 1536 output dimension. |

## Remarks

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# OpenAIEmbeddingModels.TextEmbedding3Large field

Most capable embedding model for both english and non-english tasks. It uses a 3072 output dimension.

```csharp
public const string TextEmbedding3Large;
```

## See Also

* class [OpenAIEmbeddingModels](../OpenAIEmbeddingModels.md)
* namespace [ChatGptNet.Models.Embeddings](../../ChatGptNet.md)

<!-- DO NOT EDIT: generated by xmldocmd for ChatGptNet.dll -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# OpenAIEmbeddingModels.TextEmbedding3Small field

Increased performance over 2nd generation ada embedding model. It uses a 1536 output dimension.

```csharp
public const string TextEmbedding3Small;
```

## See Also

* class [OpenAIEmbeddingModels](../OpenAIEmbeddingModels.md)
* namespace [ChatGptNet.Models.Embeddings](../../ChatGptNet.md)

<!-- DO NOT EDIT: generated by xmldocmd for ChatGptNet.dll -->
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# OpenAIEmbeddingModels.TextEmbeddingAda002 field

The second generation embedding model provided by OpenAI.
The second generation embedding model provided by OpenAI. It uses a 1536 output dimension.

```csharp
public const string TextEmbeddingAda002;
Expand Down
1 change: 1 addition & 0 deletions docs/ChatGptNet/ChatGptOptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ public class ChatGptOptions
| --- | --- |
| [ChatGptOptions](ChatGptOptions/ChatGptOptions.md)() | The default constructor. |
| [DefaultEmbeddingModel](ChatGptOptions/DefaultEmbeddingModel.md) { get; set; } | Gets or sets the default model for embedding. (default: [`TextEmbeddingAda002`](../ChatGptNet.Models.Embeddings/OpenAIEmbeddingModels/TextEmbeddingAda002.md) when the provider is OpenAI). |
| [DefaultEmbeddingParameters](ChatGptOptions/DefaultEmbeddingParameters.md) { get; } | Gets or sets the default parameters for embeddings. |
| [DefaultModel](ChatGptOptions/DefaultModel.md) { getset; } | Gets or sets the default model for chat completion. (default: [`Gpt35Turbo`](../ChatGptNet.Models/OpenAIChatGptModels/Gpt35Turbo.md) when the provider is OpenAI). |
| [DefaultParameters](ChatGptOptions/DefaultParameters.md) { get; } | Gets or sets the default parameters for chat completion. |
| [MessageExpiration](ChatGptOptions/MessageExpiration.md) { getset; } | Gets or sets the expiration for cached conversation messages (default: 1 hour). |
Expand Down
15 changes: 15 additions & 0 deletions docs/ChatGptNet/ChatGptOptions/DefaultEmbeddingParameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# ChatGptOptions.DefaultEmbeddingParameters property

Gets or sets the default parameters for embeddings.

```csharp
public EmbeddingParameters DefaultEmbeddingParameters { get; }
```

## See Also

* class [EmbeddingParameters](../../ChatGptNet.Models.Embeddings/EmbeddingParameters.md)
* class [ChatGptOptions](../ChatGptOptions.md)
* namespace [ChatGptNet](../../ChatGptNet.md)

<!-- DO NOT EDIT: generated by xmldocmd for ChatGptNet.dll -->
3 changes: 2 additions & 1 deletion docs/ChatGptNet/ChatGptOptionsBuilder.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ public class ChatGptOptionsBuilder
| name | description |
| --- | --- |
| [ChatGptOptionsBuilder](ChatGptOptionsBuilder/ChatGptOptionsBuilder.md)() | The default constructor. |
| [DefaultEmbeddingModel](ChatGptOptionsBuilder/DefaultEmbeddingModel.md) { get; set; } | Gets or sets the default model for embedding. (default: [`TextEmbeddingAda002`](../ChatGptNet.Models.Embeddings/OpenAIEmbeddingModels/TextEmbeddingAda002.md) when the provider is OpenAI). |
| [DefaultEmbeddingModel](ChatGptOptionsBuilder/DefaultEmbeddingModel.md) { get; set; } | Gets or sets the default model for embeddings. (default: [`TextEmbeddingAda002`](../ChatGptNet.Models.Embeddings/OpenAIEmbeddingModels/TextEmbeddingAda002.md) when the provider is OpenAI). |
| [DefaultEmbeddingParameters](ChatGptOptionsBuilder/DefaultEmbeddingParameters.md) { get; } | Gets or sets the default parameters for embeddings. |
| [DefaultModel](ChatGptOptionsBuilder/DefaultModel.md) { getset; } | Gets or sets the default model for chat completion. (default: [`Gpt35Turbo`](../ChatGptNet.Models/OpenAIChatGptModels/Gpt35Turbo.md) when the provider is OpenAI). |
| [DefaultParameters](ChatGptOptionsBuilder/DefaultParameters.md) { getset; } | Gets or sets the default parameters for chat completion. |
| [MessageExpiration](ChatGptOptionsBuilder/MessageExpiration.md) { getset; } | Gets or sets the expiration for cached conversation messages (default: 1 hour). |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ChatGptOptionsBuilder.DefaultEmbeddingModel property

Gets or sets the default model for embedding. (default: [`TextEmbeddingAda002`](../../ChatGptNet.Models.Embeddings/OpenAIEmbeddingModels/TextEmbeddingAda002.md) when the provider is OpenAI).
Gets or sets the default model for embeddings. (default: [`TextEmbeddingAda002`](../../ChatGptNet.Models.Embeddings/OpenAIEmbeddingModels/TextEmbeddingAda002.md) when the provider is OpenAI).

```csharp
public string? DefaultEmbeddingModel { get; set; }
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# ChatGptOptionsBuilder.DefaultEmbeddingParameters property

Gets or sets the default parameters for embeddings.

```csharp
public EmbeddingParameters DefaultEmbeddingParameters { get; }
```

## See Also

* class [EmbeddingParameters](../../ChatGptNet.Models.Embeddings/EmbeddingParameters.md)
* class [ChatGptOptionsBuilder](../ChatGptOptionsBuilder.md)
* namespace [ChatGptNet](../../ChatGptNet.md)

<!-- DO NOT EDIT: generated by xmldocmd for ChatGptNet.dll -->
2 changes: 1 addition & 1 deletion docs/ChatGptNet/IChatGptClient.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ public interface IChatGptClient
| [AskStreamAsync](IChatGptClient/AskStreamAsync.md)(…) | Requests a new chat interaction with streaming response, like in ChatGPT. (2 methods) |
| [ConversationExistsAsync](IChatGptClient/ConversationExistsAsync.md)(…) | Checks if a chat conversation exists. |
| [DeleteConversationAsync](IChatGptClient/DeleteConversationAsync.md)(…) | Deletes a chat conversation, clearing all the history. |
| [GenerateEmbeddingAsync](IChatGptClient/GenerateEmbeddingAsync.md)(…) | Generates embeddings for a message. (2 methods) |
| [GenerateEmbeddingAsync](IChatGptClient/GenerateEmbeddingAsync.md)(…) | Generates embeddings for a text. (2 methods) |
| [GetConversationAsync](IChatGptClient/GetConversationAsync.md)(…) | Retrieves a chat conversation from the cache. |
| [LoadConversationAsync](IChatGptClient/LoadConversationAsync.md)(…) | Loads messages into a new conversation. (2 methods) |
| [SetupAsync](IChatGptClient/SetupAsync.md)(…) | Setups a new conversation with a system message, that is used to influence assistant behavior. (2 methods) |
Expand Down
20 changes: 13 additions & 7 deletions docs/ChatGptNet/IChatGptClient/GenerateEmbeddingAsync.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
# IChatGptClient.GenerateEmbeddingAsync method (1 of 2)

Generates embeddings for a list of messages.
Generates embeddings for a list of texts.

```csharp
public Task<EmbeddingResponse> GenerateEmbeddingAsync(IEnumerable<string> messages,
string? model = null, CancellationToken cancellationToken = default)
public Task<EmbeddingResponse> GenerateEmbeddingAsync(IEnumerable<string> texts,
EmbeddingParameters? parameters = null, string? model = null,
CancellationToken cancellationToken = default)
```

| parameter | description |
| --- | --- |
| messages | The messages to use for generating embeddings. |
| texts | The texts to use for generating embeddings. |
| parameters | An [`EmbeddingParameters`](../../ChatGptNet.Models.Embeddings/EmbeddingParameters.md) object used to override the default embedding parameters in the [`DefaultEmbeddingParameters`](../ChatGptOptions/DefaultEmbeddingParameters.md) property. |
| model | The name of the embedding model. If *model* is `null`, then the one specified in the [`DefaultEmbeddingModel`](../ChatGptOptions/DefaultEmbeddingModel.md) property will be used. |
| cancellationToken | The token to monitor for cancellation requests. |

Expand All @@ -26,23 +28,26 @@ The embeddings for the provided messages.
## See Also

* class [EmbeddingResponse](../../ChatGptNet.Models.Embeddings/EmbeddingResponse.md)
* class [EmbeddingParameters](../../ChatGptNet.Models.Embeddings/EmbeddingParameters.md)
* interface [IChatGptClient](../IChatGptClient.md)
* namespace [ChatGptNet](../../ChatGptNet.md)

---

# IChatGptClient.GenerateEmbeddingAsync method (2 of 2)

Generates embeddings for a message.
Generates embeddings for a text.

```csharp
public Task<EmbeddingResponse> GenerateEmbeddingAsync(string message, string? model = null,
public Task<EmbeddingResponse> GenerateEmbeddingAsync(string text,
EmbeddingParameters? parameters = null, string? model = null,
CancellationToken cancellationToken = default)
```

| parameter | description |
| --- | --- |
| message | The message to use for generating embeddings. |
| text | The text to use for generating embeddings. |
| parameters | An [`EmbeddingParameters`](../../ChatGptNet.Models.Embeddings/EmbeddingParameters.md) object used to override the default embedding parameters in the [`DefaultEmbeddingParameters`](../ChatGptOptions/DefaultEmbeddingParameters.md) property. |
| model | The name of the embedding model. If *model* is `null`, then the one specified in the [`DefaultEmbeddingModel`](../ChatGptOptions/DefaultEmbeddingModel.md) property will be used. |
| cancellationToken | The token to monitor for cancellation requests. |

Expand All @@ -59,6 +64,7 @@ The embeddings for the provided message.
## See Also

* class [EmbeddingResponse](../../ChatGptNet.Models.Embeddings/EmbeddingResponse.md)
* class [EmbeddingParameters](../../ChatGptNet.Models.Embeddings/EmbeddingParameters.md)
* interface [IChatGptClient](../IChatGptClient.md)
* namespace [ChatGptNet](../../ChatGptNet.md)

Expand Down
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@
| public type | description |
| --- | --- |
| class [EmbeddingData](./ChatGptNet.Models.Embeddings/EmbeddingData.md) | Represents an embedding. |
| class [EmbeddingParameters](./ChatGptNet.Models.Embeddings/EmbeddingParameters.md) | Represents embeddings parameters. |
| class [EmbeddingResponse](./ChatGptNet.Models.Embeddings/EmbeddingResponse.md) | Represents an embedding response. |
| static class [OpenAIEmbeddingModels](./ChatGptNet.Models.Embeddings/OpenAIEmbeddingModels.md) | Contains all the embedding models that are currently supported by OpenAI. |

Expand Down
2 changes: 1 addition & 1 deletion samples/ChatGptApi/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ await foreach (var delta in responseStream.AsDeltas())
})
.WithOpenApi();

app.MapPost("/api/embeddings/CosineSimilarity", async (CosineSimilarityRequest request, IChatGptClient chatGptClient) =>
app.MapPost("/api/embeddings/cosine-similarity", async (CosineSimilarityRequest request, IChatGptClient chatGptClient) =>
{
var firstEmbeddingResponse = await chatGptClient.GenerateEmbeddingAsync(request.FirstMessage);
var secondEmbeddingResponse = await chatGptClient.GenerateEmbeddingAsync(request.SecondMessage);
Expand Down
11 changes: 7 additions & 4 deletions samples/ChatGptApi/appsettings.json
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
{
"ChatGPT": {
"Provider": "OpenAI", // Optional. Allowed values: OpenAI (default) or Azure
"Provider": "OpenaI", // Optional. Allowed values: OpenAI (default) or Azure
"ApiKey": "", // Required
//"Organization": "", // Optional, used only by OpenAI
//"Organization": "", // Optional, used only by OpenAI
"ResourceName": "", // Required when using Azure OpenAI Service
"ApiVersion": "2023-12-01-preview", // Optional, used only by Azure OpenAI Service (default: 2023-12-01-preview)
"ApiVersion": "2023-12-01-preview", // Optional, used only by Azure OpenAI Service (default: 2023-08-01-preview)
"AuthenticationType": "ApiKey", // Optional, used only by Azure OpenAI Service. Allowed values: ApiKey (default) or ActiveDirectory

"DefaultModel": "my-model",
"DefaultEmbeddingModel": "text-embedding-ada-002", // Optional, set it if you want to use embeddings
"MessageLimit": 20,
"MessageExpiration": "00:30:00",
"ThrowExceptionOnError": true, // Optional, default: true
"ThrowExceptionOnError": true // Optional, default: true
//"User": "UserName",
//"DefaultParameters": {
// "Temperature": 0.8,
Expand All @@ -21,6 +21,9 @@
// "FrequencyPenalty": 0,
// "ResponseFormat": { "Type": "text" }, // Allowed values for Type: text (default) or json_object
// "Seed": 42 // Optional (any integer value)
//},
//"DefaultEmbeddingParameters": {
// "Dimensions": 1536
//}
},
"Logging": {
Expand Down
5 changes: 4 additions & 1 deletion samples/ChatGptConsole/appsettings.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"DefaultEmbeddingModel": "text-embedding-ada-002", // Optional, it set if you want to use embeddings
"MessageLimit": 20,
"MessageExpiration": "00:30:00",
"ThrowExceptionOnError": true,
"ThrowExceptionOnError": true
//"User": "UserName",
//"DefaultParameters": {
// "Temperature": 0.8,
Expand All @@ -21,6 +21,9 @@
// "FrequencyPenalty": 0,
// "ResponseFormat": { "Type": "text" }, // Allowed values for Type: text (default) or json_object
// "Seed": 42 // Optional (any integer value)
//},
//"DefaultEmbeddingParameters": {
// "Dimensions": 1536
//}
},
"Logging": {
Expand Down
5 changes: 4 additions & 1 deletion samples/ChatGptFunctionCallingConsole/appsettings.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"DefaultEmbeddingModel": "text-embedding-ada-002", // Optional, it set if you want to use embeddings
"MessageLimit": 20,
"MessageExpiration": "00:30:00",
"ThrowExceptionOnError": true, // Optional, default: true
"ThrowExceptionOnError": true // Optional, default: true
//"User": "UserName",
//"DefaultParameters": {
// "Temperature": 0.8,
Expand All @@ -21,6 +21,9 @@
// "FrequencyPenalty": 0,
// "ResponseFormat": { "Type": "text" }, // Allowed values for Type: text (default) or json_object
// "Seed": 42 // Optional (any integer value)
//},
//"DefaultEmbeddingParameters": {
// "Dimensions": 1536
//}
},
"Logging": {
Expand Down

0 comments on commit bb87861

Please sign in to comment.