Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(obsolete - do not merge) [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support #108310

Conversation

markjhoy
Copy link
Contributor

@markjhoy markjhoy commented May 6, 2024

This PR adds support for Azure AI Studio integration into the Inference API. Currently this supports text_embedding and completion task types.

Prerequisites to Model Creation

image

Model Creation:

PUT _inference/{tasktype}/{model_id}
{
  "service": "azureaistudio",
  "service_settings": {
    "api_key": "{api_key}",
    "target": “{deployment_target}”,
    “provider”: “(model provider}”,
    “endpoint_type”: “(endpoint type)”
  }
}
  • Valid {tasktype} types are: [text_embedding, completion]

Required Service Settings:

  • api_key: The API key can be found on your Azure AI Studio deployment's overview page
  • target: The target URL can be found on your Azure AI Studio deployment's overview page
  • provider: Valid provider types are (case insensitive):
    • openai - available for embeddings and completion
    • mistral - available for completion only
    • meta - available for completion only
    • microsoft_phi - available for completion only
    • cohere - available for embeddings and completion
    • snowflake - available for completion only
    • databricks - available for completion only
  • endpoint_type: Valid endpoint types are:
    • token - a "pay as you go" endpoint (charged by token)
      • Available for OpenAI, Meta and Cohere
    • realtime - a realtime endpoint VM deployment (charged by the hour)
      • Available for Mistral, Meta, Microsoft Phi, Snowflake and Databricks

Embeddings Service Settings

  • dimensions: (optional) the number of dimensions the resulting output embeddings should have.

Embeddings Task Settings

(this is also overridable in the inference request)

  • user: (optional) a string that is a unique identifier representing your end-user. This helps Azure AI Studio in the case of abuse or issues for debugging.

Completion Service Settings

(no additional service settings)

Completion Task Settings

(these are all optional and can be overridden in the inference request)

  • temperature: What sampling temperature to use, between 0 and 2. Higher values mean the model takes more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. Microsoft recommends altering this or top_p but not both.
  • top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Microsoft recommends altering this or temperature but not both.
  • do_sample: request to perform the sampling or not
  • max_new_tokens: the maximum number of new tokens the chat completion inference should produce in the output

Text Embedding Inference

POST _inference/text_embedding/{model_id}
{
    "input": "The answer to the universe is"
}

Chat Completion Inference

POST _inference/completion/{model_id}
{
    "input": "The answer to the universe is"
}

@markjhoy markjhoy force-pushed the markjhoy/azure_ai_studio_integration_inference branch from a5eba85 to ae6564b Compare May 8, 2024 19:15
@markjhoy markjhoy changed the title Ad Azure AI Studio Embeddings and Chat Completion Support for Inference API [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support May 8, 2024
@markjhoy markjhoy added >non-issue :ml Machine learning Team:ML Meta label for the ML team :EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team labels May 8, 2024
@markjhoy markjhoy marked this pull request as ready for review May 8, 2024 21:14
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ent-search-eng (Team:Enterprise Search)

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

It's a very large PR I'd like to do some testing then take another look

@markjhoy
Copy link
Contributor Author

markjhoy commented May 9, 2024

buildkite test this

@markjhoy markjhoy requested a review from davidkyle May 9, 2024 15:40
shainaraskas and others added 3 commits May 9, 2024 12:24
This moves the "skip" logic from our IT_test_only suffix into a new
feature - this one is historical `esql.enrich_load`. This feature is not
supported by `CsvTests` but is supported across all tests.
axw and others added 23 commits May 9, 2024 12:24
Linux systems with multiarch (e.g. i386 & x86_64) libraries
may have libsystemd.0 in two subdirectories of an entry in
java.library.path. For example, libsystemd.so.0 may be found
in both /usr/lib/i386-linux-gnu and /usr/lib/x86_64-linux-gnu.

Instead of attempting to load any library found, attempt all
and stop as soon as one is successfully loaded.
Today, we do not wait for remote sinks to stop before completing the 
main request. While this doesn't affect correctness, it's important  that
we do not spawn child requests after the parent request is completed.

Closes elastic#105859
…lastic#107581)

Previously these were contained in the index template, however, Kibana
needs to be able to make overrides to only the settings, so factoring
these out would allow them to do this (in such a way that they can be
overridden by the `kibana-reporting@custom` component template as well).

Relates to elastic#97765
When the file watched by file settings is initially missing, a special
method in reserved state service is called to write a dummy cluster
state entry. In the case of tests, there is no real running master
service, so when the task is submitted, the file watcher thread actually
barfs and the watcher dies, silently. That then causes the test to
timeout as it waits indefinitely but the file watcher is no longer
watching for the test file that was written.

This commit mocks out writing this empty state in the reserved state
service. It also collapses the two tests that check stopping while
blocked in processing works since they were almost exactly the same.

closes elastic#106968
Adding and removing appenders in Log4j is not threadsafe. Yet some tests
rely on capturing logging by adding an in memory appender,
MockLogAppender. This commit makes the mock logging threadsafe by
creating a new, singular appender for mock logging that delegates, in a
threadsafe way, to the existing appenders created. Confusingly
MockLogAppender is no longer really an appender, but I'm leaving
clarifying that for a followup so as to limit the scope of this PR.

closes elastic#106425
Disable location quoting in FROM command before 8.14 release to allow
 more time to discuss options
This change introduces operator factories for time-series aggregations. 
A time-series aggregation executes in three stages, deviating from the
typical two-stage aggregation.

For example: `sum(rate(write_requests)), avg(cpu) BY cluster, time-bucket`

**1. Initial Stage:** In this stage, a standard hash aggregation is
executed, grouped by tsid and time-bucket. The `values` aggregations are
added to collect values of the grouping keys excluding the time-bucket,
which are then used for final result grouping.

```
rate[INITIAL](write_requests), avg[INITIAL](cpu), values[SINGLE](cluster) BY tsid, time-bucket
```

**2. Intermediate Stage:** Equivalent to the final mode of a standard
hash aggregation. This stage merges and reduces the result of the rate
aggregations, but merges without reducing the results of non-rate
aggregations. Certain aggregations, such as count_distinct, cannot have
their final results combined.

```
rate[FINAL](write_requests), avg[INTERMEDIATE](cpu), values[SINGLE](cluster) BY tsid, time-bucket
```

**3. Final Stage:** This extra stage performs outer aggregations over
the rate results and combines the intermediate results of non-rate 
aggregations using the specified user-defined grouping keys.

```
sum[SINGLE](rate_result), avg[FINAL](cpu) BY cluster, bucket
```
…ic#108444)

* apm-data: ignore_{malformed,dynamic_beyond_limit}

Enable ignore_malformed on all non-metrics APM data streams,
and enable ignore_dynamic_beyond_limit for all APM data streams.

We can enable ignore_malformed on metrics data streams when
elastic#90007 is fixed.

* Update docs/changelog/108444.yaml
This information is more discoverable as the class-level javadocs for
`ActionListener` itself rather than hidden away in a separate Markdown
file. Also this way the links all stay up to date.
* [DOCS] Fixes typo in Cohere ES tutorial.

* [DOCS] Fixes list.
This moves examples from files marked to run in integration tests only
to the files where they belong and disables this pattern matching. We
now use supported features.
Correct a small typo: one closing ">" was missing.
This PR logs tasks that are running after the disruption is cleared, 
allowing us to investigate why the disruption tests failed in elastic#107347.

Relates elastic#107347
* Add aggregation intermediate reduction level and estimatedRowSize
computed value
This wires up the "new" APM metrics integration to the existing Aggregations usage tracking system. It introduces one new metric, a LongCounter named es.search.query.aggregations.total, which has dimensions for the specific aggregation being run, and the values source type we resolved it to.

---------

Co-authored-by: Elastic Machine <[email protected]>
Prior to this PR, if a SignificantTerms aggregation targeted a field existing on two indices (that were included in the aggregation) but mapped to different field types, the query would fail at reduce time with a somewhat obscure ClassCastException. This change brings the behavior in line with the Terms aggregation, which returns a 400 class IllegalArgumentException with a useful message in this situation.

Resolves elastic#108427
@markjhoy markjhoy requested review from a team as code owners May 9, 2024 16:26
@markjhoy
Copy link
Contributor Author

markjhoy commented May 9, 2024

Oh - I sometimes hate git and it's ability to not allow a clean merge... going to close this and re-open a new one that's clean.

@markjhoy markjhoy closed this May 9, 2024
Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add these here so I don't lose them haha, I'll copy them over to the new PR when it's up 👍

namedWriteables.add(
new NamedWriteableRegistry.Entry(InferenceServiceResults.class, RankedDocsResults.NAME, RankedDocsResults::new)
);
addInferenceResultsNamedWriteables(namedWriteables);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring!

super(model);
this.input = Objects.requireNonNull(input);
this.completionModel = Objects.requireNonNull(model);
this.isRealtimeEndpoint = this.completionModel.endpointType() == AzureAiStudioEndpointType.REALTIME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get this field from AzureAiStudioRequest right?

private final Truncator.TruncationResult truncationResult;
private final Truncator truncator;

public AzureAiStudioEmbeddingsRequest(Truncator truncator, Truncator.TruncationResult input, AzureAiStudioModel model) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the cast, I think we can change the model to be AzureAiStudioEmbeddingsModel. Seems like each place we initialize this class we already know it's an embedding model.


package org.elasticsearch.xpack.inference.external.request.azureaistudio;

public final class AzureAiStudioRequestFields {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we make the default constructor private?

* from a chat completion process. This is a start to abstract away from direct static methods
* for the "fromResponse" method in the apply function as used in other inference and provider types.
*/
public abstract class ChatCompletionResponseEntity implements ResponseParser {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, since this class and EmbeddingResponseEntity are pretty similar for now, how do you feel about collapsing them into a single class?

I think we can achieve that by doing something like:

protected abstract InferenceServiceResults fromResponse(Request request, HttpResult response) throws IOException;

...

Or I suppose we could omit this class and have the child classes implement ResponseParser directly?

My reasoning for using InferenceServiceResults is because parsers like cohere can return different types (TextEmbeddingResults and TextEmbeddingByteResults).

}
}

public static String unsupportedAzureAiStudioProviderErrorMsg(String endpointType, String serviceName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, do we need this function?

return valueOf(name.trim().toUpperCase(Locale.ROOT));
}

public static AzureAiStudioProvider fromStringOrStatusException(String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this since it's not used?

}
}

public static String unsupportedAzureAiStudioProviderErrorMsg(String provider, String serviceName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this since it's not used?

public DefaultSecretSettings getSecretSettings() {
return (DefaultSecretSettings) super.getSecretSettings();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we keep with the visitor pattern and add an accept() method here similar to azure openai: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/azureopenai/AzureOpenAiModel.java#L59

If we add a new task type, hopefully it'd limit the number of changes to the azure studio service class.

) {
var actionCreator = new AzureAiStudioActionCreator(getSender(), getServiceComponents());

if (model instanceof AzureAiStudioEmbeddingsModel embeddingsModel) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this above, but the reason I chose the visitor pattern for the other services was to let the model dictate the action that is used. That way we don't need the instanceof checks here, except one to ensure that it's a AzureAiStudioModel model with an accept() method.

@markjhoy markjhoy changed the title [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support (obsolete - do not merge) [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support May 9, 2024
@markjhoy
Copy link
Contributor Author

markjhoy commented May 9, 2024

Sorry @jonathan-buttner ! :) -- new PR is up here: #108472

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:EnterpriseSearch/Application Enterprise Search :ml Machine learning >non-issue Team:Enterprise Search Meta label for Enterprise Search team Team:ML Meta label for the ML team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet