(obsolete - do not merge) [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support #108310

markjhoy · 2024-05-06T13:05:08Z

This PR adds support for Azure AI Studio integration into the Inference API. Currently this supports text_embedding and completion task types.

Prerequisites to Model Creation

You must have an Azure subscription with Azure AI Studio access
You must have a deployed model either Chat Completion or Embeddings

Model Creation:

PUT _inference/{tasktype}/{model_id}
{
  "service": "azureaistudio",
  "service_settings": {
    "api_key": "{api_key}",
    "target": “{deployment_target}”,
    “provider”: “(model provider}”,
    “endpoint_type”: “(endpoint type)”
  }
}

Valid {tasktype} types are: [text_embedding, completion]

Required Service Settings:

api_key: The API key can be found on your Azure AI Studio deployment's overview page
target: The target URL can be found on your Azure AI Studio deployment's overview page
provider: Valid provider types are (case insensitive):
- openai - available for embeddings and completion
- mistral - available for completion only
- meta - available for completion only
- microsoft_phi - available for completion only
- cohere - available for embeddings and completion
- snowflake - available for completion only
- databricks - available for completion only
endpoint_type: Valid endpoint types are:
- token - a "pay as you go" endpoint (charged by token)
  - Available for OpenAI, Meta and Cohere
- realtime - a realtime endpoint VM deployment (charged by the hour)
  - Available for Mistral, Meta, Microsoft Phi, Snowflake and Databricks

Embeddings Service Settings

dimensions: (optional) the number of dimensions the resulting output embeddings should have.

Embeddings Task Settings

(this is also overridable in the inference request)

user: (optional) a string that is a unique identifier representing your end-user. This helps Azure AI Studio in the case of abuse or issues for debugging.

Completion Service Settings

(no additional service settings)

Completion Task Settings

(these are all optional and can be overridden in the inference request)

temperature: What sampling temperature to use, between 0 and 2. Higher values mean the model takes more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. Microsoft recommends altering this or top_p but not both.
top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Microsoft recommends altering this or temperature but not both.
do_sample: request to perform the sampling or not
max_new_tokens: the maximum number of new tokens the chat completion inference should produce in the output

Text Embedding Inference

POST _inference/text_embedding/{model_id}
{
    "input": "The answer to the universe is"
}

Chat Completion Inference

POST _inference/completion/{model_id}
{
    "input": "The answer to the universe is"
}

elasticsearchmachine · 2024-05-08T21:15:05Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2024-05-08T21:15:06Z

Pinging @elastic/ent-search-eng (Team:Enterprise Search)

davidkyle

Looks great

It's a very large PR I'd like to do some testing then take another look

...lasticsearch/xpack/inference/external/http/sender/AzureAiStudioEmbeddingsRequestManager.java

...n/java/org/elasticsearch/xpack/inference/external/response/ChatCompletionResponseEntity.java

...e/src/main/java/org/elasticsearch/xpack/inference/external/response/ErrorResponseEntity.java

...ack/inference/external/response/azureaistudio/AzureAiStudioChatCompletionResponseEntity.java

.../plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ServiceUtils.java

...java/org/elasticsearch/xpack/inference/services/azureaistudio/AzureAiStudioEndpointType.java

...main/java/org/elasticsearch/xpack/inference/services/azureaistudio/AzureAiStudioService.java

markjhoy · 2024-05-09T14:46:03Z

buildkite test this

…lastic#108390)

This moves the "skip" logic from our IT_test_only suffix into a new feature - this one is historical `esql.enrich_load`. This feature is not supported by `CsvTests` but is supported across all tests.

Linux systems with multiarch (e.g. i386 & x86_64) libraries may have libsystemd.0 in two subdirectories of an entry in java.library.path. For example, libsystemd.so.0 may be found in both /usr/lib/i386-linux-gnu and /usr/lib/x86_64-linux-gnu. Instead of attempting to load any library found, attempt all and stop as soon as one is successfully loaded.

Today, we do not wait for remote sinks to stop before completing the main request. While this doesn't affect correctness, it's important that we do not spawn child requests after the parent request is completed. Closes elastic#105859

…lastic#107581) Previously these were contained in the index template, however, Kibana needs to be able to make overrides to only the settings, so factoring these out would allow them to do this (in such a way that they can be overridden by the `kibana-reporting@custom` component template as well). Relates to elastic#97765

When the file watched by file settings is initially missing, a special method in reserved state service is called to write a dummy cluster state entry. In the case of tests, there is no real running master service, so when the task is submitted, the file watcher thread actually barfs and the watcher dies, silently. That then causes the test to timeout as it waits indefinitely but the file watcher is no longer watching for the test file that was written. This commit mocks out writing this empty state in the reserved state service. It also collapses the two tests that check stopping while blocked in processing works since they were almost exactly the same. closes elastic#106968

Resolves elastic#3861

Adding and removing appenders in Log4j is not threadsafe. Yet some tests rely on capturing logging by adding an in memory appender, MockLogAppender. This commit makes the mock logging threadsafe by creating a new, singular appender for mock logging that delegates, in a threadsafe way, to the existing appenders created. Confusingly MockLogAppender is no longer really an appender, but I'm leaving clarifying that for a followup so as to limit the scope of this PR. closes elastic#106425

Disable location quoting in FROM command before 8.14 release to allow more time to discuss options

This change introduces operator factories for time-series aggregations. A time-series aggregation executes in three stages, deviating from the typical two-stage aggregation. For example: `sum(rate(write_requests)), avg(cpu) BY cluster, time-bucket` **1. Initial Stage:** In this stage, a standard hash aggregation is executed, grouped by tsid and time-bucket. The `values` aggregations are added to collect values of the grouping keys excluding the time-bucket, which are then used for final result grouping. ``` rate[INITIAL](write_requests), avg[INITIAL](cpu), values[SINGLE](cluster) BY tsid, time-bucket ``` **2. Intermediate Stage:** Equivalent to the final mode of a standard hash aggregation. This stage merges and reduces the result of the rate aggregations, but merges without reducing the results of non-rate aggregations. Certain aggregations, such as count_distinct, cannot have their final results combined. ``` rate[FINAL](write_requests), avg[INTERMEDIATE](cpu), values[SINGLE](cluster) BY tsid, time-bucket ``` **3. Final Stage:** This extra stage performs outer aggregations over the rate results and combines the intermediate results of non-rate aggregations using the specified user-defined grouping keys. ``` sum[SINGLE](rate_result), avg[FINAL](cpu) BY cluster, bucket ```

…ks (elastic#108347)

…ic#108445) (referenced from get and multi_get API docs) Closes elastic#98385

…ic#108444) * apm-data: ignore_{malformed,dynamic_beyond_limit} Enable ignore_malformed on all non-metrics APM data streams, and enable ignore_dynamic_beyond_limit for all APM data streams. We can enable ignore_malformed on metrics data streams when elastic#90007 is fixed. * Update docs/changelog/108444.yaml

This information is more discoverable as the class-level javadocs for `ActionListener` itself rather than hidden away in a separate Markdown file. Also this way the links all stay up to date.

* [DOCS] Fixes typo in Cohere ES tutorial. * [DOCS] Fixes list.

This moves examples from files marked to run in integration tests only to the files where they belong and disables this pattern matching. We now use supported features.

Correct a small typo: one closing ">" was missing.

This PR logs tasks that are running after the disruption is cleared, allowing us to investigate why the disruption tests failed in elastic#107347. Relates elastic#107347

* Add aggregation intermediate reduction level and estimatedRowSize computed value

This wires up the "new" APM metrics integration to the existing Aggregations usage tracking system. It introduces one new metric, a LongCounter named es.search.query.aggregations.total, which has dimensions for the specific aggregation being run, and the values source type we resolved it to. --------- Co-authored-by: Elastic Machine <[email protected]>

Prior to this PR, if a SignificantTerms aggregation targeted a field existing on two indices (that were included in the aggregation) but mapped to different field types, the query would fail at reduce time with a somewhat obscure ClassCastException. This change brings the behavior in line with the Terms aggregation, which returns a 400 class IllegalArgumentException with a useful message in this situation. Resolves elastic#108427

markjhoy · 2024-05-09T16:27:28Z

Oh - I sometimes hate git and it's ability to not allow a clean merge... going to close this and re-open a new one that's clean.

jonathan-buttner

I'm going to add these here so I don't lose them haha, I'll copy them over to the new PR when it's up 👍

jonathan-buttner · 2024-05-09T12:44:05Z

...erence/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java

- namedWriteables.add(
- new NamedWriteableRegistry.Entry(InferenceServiceResults.class, RankedDocsResults.NAME, RankedDocsResults::new)
- );
+ addInferenceResultsNamedWriteables(namedWriteables);


Thanks for refactoring!

jonathan-buttner · 2024-05-09T13:31:15Z

...earch/xpack/inference/external/request/azureaistudio/AzureAiStudioChatCompletionRequest.java

+ super(model);
+ this.input = Objects.requireNonNull(input);
+ this.completionModel = Objects.requireNonNull(model);
+ this.isRealtimeEndpoint = this.completionModel.endpointType() == AzureAiStudioEndpointType.REALTIME;


I think we can get this field from AzureAiStudioRequest right?

jonathan-buttner · 2024-05-09T14:08:41Z

...ticsearch/xpack/inference/external/request/azureaistudio/AzureAiStudioEmbeddingsRequest.java

+ private final Truncator.TruncationResult truncationResult;
+ private final Truncator truncator;
+
+ public AzureAiStudioEmbeddingsRequest(Truncator truncator, Truncator.TruncationResult input, AzureAiStudioModel model) {


To avoid the cast, I think we can change the model to be AzureAiStudioEmbeddingsModel. Seems like each place we initialize this class we already know it's an embedding model.

jonathan-buttner · 2024-05-09T14:12:28Z

...elasticsearch/xpack/inference/external/request/azureaistudio/AzureAiStudioRequestFields.java

+
+package org.elasticsearch.xpack.inference.external.request.azureaistudio;
+
+public final class AzureAiStudioRequestFields {


How about we make the default constructor private?

jonathan-buttner · 2024-05-09T14:29:04Z

...n/java/org/elasticsearch/xpack/inference/external/response/ChatCompletionResponseEntity.java

+ * from a chat completion process. This is a start to abstract away from direct static methods
+ * for the "fromResponse" method in the apply function as used in other inference and provider types.
+ */
+public abstract class ChatCompletionResponseEntity implements ResponseParser {


hmm, since this class and EmbeddingResponseEntity are pretty similar for now, how do you feel about collapsing them into a single class?

I think we can achieve that by doing something like:

protected abstract InferenceServiceResults fromResponse(Request request, HttpResult response) throws IOException; ...

Or I suppose we could omit this class and have the child classes implement ResponseParser directly?

My reasoning for using InferenceServiceResults is because parsers like cohere can return different types (TextEmbeddingResults and TextEmbeddingByteResults).

jonathan-buttner · 2024-05-09T14:40:04Z

...java/org/elasticsearch/xpack/inference/services/azureaistudio/AzureAiStudioEndpointType.java

+ }
+ }
+
+ public static String unsupportedAzureAiStudioProviderErrorMsg(String endpointType, String serviceName) {


Same here, do we need this function?

jonathan-buttner · 2024-05-09T14:41:30Z

...ain/java/org/elasticsearch/xpack/inference/services/azureaistudio/AzureAiStudioProvider.java

+ return valueOf(name.trim().toUpperCase(Locale.ROOT));
+ }
+
+ public static AzureAiStudioProvider fromStringOrStatusException(String name) {


Do we need this since it's not used?

jonathan-buttner · 2024-05-09T14:41:34Z

...ain/java/org/elasticsearch/xpack/inference/services/azureaistudio/AzureAiStudioProvider.java

+ }
+ }
+
+ public static String unsupportedAzureAiStudioProviderErrorMsg(String provider, String serviceName) {


Do we need this since it's not used?

jonathan-buttner · 2024-05-09T14:56:05Z

...c/main/java/org/elasticsearch/xpack/inference/services/azureaistudio/AzureAiStudioModel.java

+ public DefaultSecretSettings getSecretSettings() {
+ return (DefaultSecretSettings) super.getSecretSettings();
+ }
+}


How about we keep with the visitor pattern and add an accept() method here similar to azure openai: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/azureopenai/AzureOpenAiModel.java#L59

If we add a new task type, hopefully it'd limit the number of changes to the azure studio service class.

jonathan-buttner · 2024-05-09T16:11:24Z

...main/java/org/elasticsearch/xpack/inference/services/azureaistudio/AzureAiStudioService.java

+ ) {
+ var actionCreator = new AzureAiStudioActionCreator(getSender(), getServiceComponents());
+
+ if (model instanceof AzureAiStudioEmbeddingsModel embeddingsModel) {


I mentioned this above, but the reason I chose the visitor pattern for the other services was to let the model dictate the action that is used. That way we don't need the instanceof checks here, except one to ensure that it's a AzureAiStudioModel model with an accept() method.

markjhoy · 2024-05-09T16:32:37Z

Sorry @jonathan-buttner ! :) -- new PR is up here: #108472

elasticsearchmachine added the v8.15.0 label May 6, 2024

markjhoy added 10 commits May 8, 2024 12:34

initial implementation azure ai studio _inference

b22a3ea

Add embeddings service settings model tests

1a0ca27

add completion service settings tests

623c7ac

adds chat completion and embeddings request tests

ce87abb

add tests for completion and embeddings responses

46ced7a

add action and action creator test coverage

88893e8

remove forbidden API call to .getBytes()

422e85e

rename "Completions" to "ChatCompletions"

ab612de

adding service tests

645a9f9

finalize service settings test

ae6564b

markjhoy force-pushed the markjhoy/azure_ai_studio_integration_inference branch from a5eba85 to ae6564b Compare May 8, 2024 19:15

markjhoy changed the title ~~Ad Azure AI Studio Embeddings and Chat Completion Support for Inference API~~ [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support May 8, 2024

markjhoy added >non-issue :ml Machine learning Team:ML Meta label for the ML team :EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team labels May 8, 2024

markjhoy marked this pull request as ready for review May 8, 2024 21:14

markjhoy requested review from davidkyle and jonathan-buttner May 8, 2024 21:23

davidkyle reviewed May 9, 2024

View reviewed changes

cleanups; add missing comments 🤦

3101017

markjhoy requested a review from davidkyle May 9, 2024 15:40

shainaraskas and others added 3 commits May 9, 2024 12:24

[DOCS] Add API example + diagrams to shard allocation awareness docs (e…

864a10f

…lastic#108390)

ESQL: Move a few more test out of IT_test_only (elastic#108377)

255eaac

This moves the "skip" logic from our IT_test_only suffix into a new feature - this one is historical `esql.enrich_load`. This feature is not supported by `CsvTests` but is supported across all tests.

Brief document blurb about RestClient (elastic#107863)

2380664

axw and others added 23 commits May 9, 2024 12:24

Mention alias filters don't apply for get-by-id in docs (elastic#108433)

84c1bbb

Resolves elastic#3861

ESQL: Disable quoting in FROM command (elastic#108431)

559cbaa

Disable location quoting in FROM command before 8.14 release to allow more time to discuss options

ES|QL: account for page overhead when calculating memory used by bloc…

bdcd185

…ks (elastic#108347)

[DOCS] Fix stored_fields parameter description (elastic#98385) (elast…

be4615e

…ic#108445) (referenced from get and multi_get API docs) Closes elastic#98385

[DOCS] Adds complete Cohere tutorial (elastic#108415)

42ac908

Move conceptual docs about ActionListener (elastic#107875)

88700aa

This information is more discoverable as the class-level javadocs for `ActionListener` itself rather than hidden away in a separate Markdown file. Also this way the links all stay up to date.

Fix race in SpawnerNoBootstrapTests (elastic#108416)

6ede2ea

[DOCS] Fixes typo in Cohere ES tutorial (elastic#108456)

402f3d9

* [DOCS] Fixes typo in Cohere ES tutorial. * [DOCS] Fixes list.

ESQL: Remove remaining IT_tests_only (elastic#108434)

b2feeb1

This moves examples from files marked to run in integration tests only to the files where they belong and disables this pattern matching. We now use supported features.

Correct typo in documentation (elastic#108462)

d8f3585

Correct a small typo: one closing ">" was missing.

Version-guard checking for lossy params in _source (elastic#108460)

96648d9

Log running tasks in EsqlDisruptionIT (elastic#108440)

86ce3fe

This PR logs tasks that are running after the disruption is cleared, allowing us to investigate why the disruption tests failed in elastic#107347. Relates elastic#107347

ESQL: Add aggregates node level reduction (elastic#107876)

c3ed70d

* Add aggregation intermediate reduction level and estimatedRowSize computed value

commit after rebase

09db918

markjhoy requested review from a team as code owners May 9, 2024 16:26

markjhoy closed this May 9, 2024

jonathan-buttner reviewed May 9, 2024

View reviewed changes

markjhoy changed the title ~~[Inference API] Add Azure AI Studio Embeddings and Chat Completion Support~~ (obsolete - do not merge) [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(obsolete - do not merge) [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support #108310

(obsolete - do not merge) [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support #108310

markjhoy commented May 6, 2024 •

edited

elasticsearchmachine commented May 8, 2024

elasticsearchmachine commented May 8, 2024

davidkyle left a comment

markjhoy commented May 9, 2024

markjhoy commented May 9, 2024

jonathan-buttner left a comment

jonathan-buttner May 9, 2024

jonathan-buttner May 9, 2024

jonathan-buttner May 9, 2024

jonathan-buttner May 9, 2024

jonathan-buttner May 9, 2024

jonathan-buttner May 9, 2024

jonathan-buttner May 9, 2024

jonathan-buttner May 9, 2024

jonathan-buttner May 9, 2024

jonathan-buttner May 9, 2024

markjhoy commented May 9, 2024


		package org.elasticsearch.xpack.inference.external.request.azureaistudio;

		public final class AzureAiStudioRequestFields {

(obsolete - do not merge) [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support #108310

(obsolete - do not merge) [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support #108310

Conversation

markjhoy commented May 6, 2024 • edited

Prerequisites to Model Creation

Model Creation:

Required Service Settings:

Embeddings Service Settings

Embeddings Task Settings

Completion Service Settings

Completion Task Settings

Text Embedding Inference

Chat Completion Inference

elasticsearchmachine commented May 8, 2024

elasticsearchmachine commented May 8, 2024

davidkyle left a comment

Choose a reason for hiding this comment

markjhoy commented May 9, 2024

markjhoy commented May 9, 2024

jonathan-buttner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markjhoy commented May 9, 2024

markjhoy commented May 6, 2024 •

edited