-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(obsolete - do not merge) [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support #108310
(obsolete - do not merge) [Inference API] Add Azure AI Studio Embeddings and Chat Completion Support #108310
Conversation
a5eba85
to
ae6564b
Compare
Pinging @elastic/ml-core (Team:ML) |
Pinging @elastic/ent-search-eng (Team:Enterprise Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great
It's a very large PR I'd like to do some testing then take another look
...lasticsearch/xpack/inference/external/http/sender/AzureAiStudioEmbeddingsRequestManager.java
Outdated
Show resolved
Hide resolved
...n/java/org/elasticsearch/xpack/inference/external/response/ChatCompletionResponseEntity.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/elasticsearch/xpack/inference/external/response/ErrorResponseEntity.java
Show resolved
Hide resolved
...ack/inference/external/response/azureaistudio/AzureAiStudioChatCompletionResponseEntity.java
Outdated
Show resolved
Hide resolved
.../plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ServiceUtils.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/services/azureaistudio/AzureAiStudioEndpointType.java
Outdated
Show resolved
Hide resolved
...main/java/org/elasticsearch/xpack/inference/services/azureaistudio/AzureAiStudioService.java
Outdated
Show resolved
Hide resolved
buildkite test this |
This moves the "skip" logic from our IT_test_only suffix into a new feature - this one is historical `esql.enrich_load`. This feature is not supported by `CsvTests` but is supported across all tests.
Linux systems with multiarch (e.g. i386 & x86_64) libraries may have libsystemd.0 in two subdirectories of an entry in java.library.path. For example, libsystemd.so.0 may be found in both /usr/lib/i386-linux-gnu and /usr/lib/x86_64-linux-gnu. Instead of attempting to load any library found, attempt all and stop as soon as one is successfully loaded.
Today, we do not wait for remote sinks to stop before completing the main request. While this doesn't affect correctness, it's important that we do not spawn child requests after the parent request is completed. Closes elastic#105859
…lastic#107581) Previously these were contained in the index template, however, Kibana needs to be able to make overrides to only the settings, so factoring these out would allow them to do this (in such a way that they can be overridden by the `kibana-reporting@custom` component template as well). Relates to elastic#97765
When the file watched by file settings is initially missing, a special method in reserved state service is called to write a dummy cluster state entry. In the case of tests, there is no real running master service, so when the task is submitted, the file watcher thread actually barfs and the watcher dies, silently. That then causes the test to timeout as it waits indefinitely but the file watcher is no longer watching for the test file that was written. This commit mocks out writing this empty state in the reserved state service. It also collapses the two tests that check stopping while blocked in processing works since they were almost exactly the same. closes elastic#106968
Adding and removing appenders in Log4j is not threadsafe. Yet some tests rely on capturing logging by adding an in memory appender, MockLogAppender. This commit makes the mock logging threadsafe by creating a new, singular appender for mock logging that delegates, in a threadsafe way, to the existing appenders created. Confusingly MockLogAppender is no longer really an appender, but I'm leaving clarifying that for a followup so as to limit the scope of this PR. closes elastic#106425
Disable location quoting in FROM command before 8.14 release to allow more time to discuss options
This change introduces operator factories for time-series aggregations. A time-series aggregation executes in three stages, deviating from the typical two-stage aggregation. For example: `sum(rate(write_requests)), avg(cpu) BY cluster, time-bucket` **1. Initial Stage:** In this stage, a standard hash aggregation is executed, grouped by tsid and time-bucket. The `values` aggregations are added to collect values of the grouping keys excluding the time-bucket, which are then used for final result grouping. ``` rate[INITIAL](write_requests), avg[INITIAL](cpu), values[SINGLE](cluster) BY tsid, time-bucket ``` **2. Intermediate Stage:** Equivalent to the final mode of a standard hash aggregation. This stage merges and reduces the result of the rate aggregations, but merges without reducing the results of non-rate aggregations. Certain aggregations, such as count_distinct, cannot have their final results combined. ``` rate[FINAL](write_requests), avg[INTERMEDIATE](cpu), values[SINGLE](cluster) BY tsid, time-bucket ``` **3. Final Stage:** This extra stage performs outer aggregations over the rate results and combines the intermediate results of non-rate aggregations using the specified user-defined grouping keys. ``` sum[SINGLE](rate_result), avg[FINAL](cpu) BY cluster, bucket ```
…ic#108445) (referenced from get and multi_get API docs) Closes elastic#98385
…ic#108444) * apm-data: ignore_{malformed,dynamic_beyond_limit} Enable ignore_malformed on all non-metrics APM data streams, and enable ignore_dynamic_beyond_limit for all APM data streams. We can enable ignore_malformed on metrics data streams when elastic#90007 is fixed. * Update docs/changelog/108444.yaml
This information is more discoverable as the class-level javadocs for `ActionListener` itself rather than hidden away in a separate Markdown file. Also this way the links all stay up to date.
* [DOCS] Fixes typo in Cohere ES tutorial. * [DOCS] Fixes list.
This moves examples from files marked to run in integration tests only to the files where they belong and disables this pattern matching. We now use supported features.
Correct a small typo: one closing ">" was missing.
This PR logs tasks that are running after the disruption is cleared, allowing us to investigate why the disruption tests failed in elastic#107347. Relates elastic#107347
* Add aggregation intermediate reduction level and estimatedRowSize computed value
This wires up the "new" APM metrics integration to the existing Aggregations usage tracking system. It introduces one new metric, a LongCounter named es.search.query.aggregations.total, which has dimensions for the specific aggregation being run, and the values source type we resolved it to. --------- Co-authored-by: Elastic Machine <[email protected]>
Prior to this PR, if a SignificantTerms aggregation targeted a field existing on two indices (that were included in the aggregation) but mapped to different field types, the query would fail at reduce time with a somewhat obscure ClassCastException. This change brings the behavior in line with the Terms aggregation, which returns a 400 class IllegalArgumentException with a useful message in this situation. Resolves elastic#108427
Oh - I sometimes hate git and it's ability to not allow a clean merge... going to close this and re-open a new one that's clean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to add these here so I don't lose them haha, I'll copy them over to the new PR when it's up 👍
namedWriteables.add( | ||
new NamedWriteableRegistry.Entry(InferenceServiceResults.class, RankedDocsResults.NAME, RankedDocsResults::new) | ||
); | ||
addInferenceResultsNamedWriteables(namedWriteables); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring!
super(model); | ||
this.input = Objects.requireNonNull(input); | ||
this.completionModel = Objects.requireNonNull(model); | ||
this.isRealtimeEndpoint = this.completionModel.endpointType() == AzureAiStudioEndpointType.REALTIME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get this field from AzureAiStudioRequest
right?
private final Truncator.TruncationResult truncationResult; | ||
private final Truncator truncator; | ||
|
||
public AzureAiStudioEmbeddingsRequest(Truncator truncator, Truncator.TruncationResult input, AzureAiStudioModel model) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the cast, I think we can change the model
to be AzureAiStudioEmbeddingsModel
. Seems like each place we initialize this class we already know it's an embedding model.
|
||
package org.elasticsearch.xpack.inference.external.request.azureaistudio; | ||
|
||
public final class AzureAiStudioRequestFields { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we make the default constructor private?
* from a chat completion process. This is a start to abstract away from direct static methods | ||
* for the "fromResponse" method in the apply function as used in other inference and provider types. | ||
*/ | ||
public abstract class ChatCompletionResponseEntity implements ResponseParser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, since this class and EmbeddingResponseEntity
are pretty similar for now, how do you feel about collapsing them into a single class?
I think we can achieve that by doing something like:
protected abstract InferenceServiceResults fromResponse(Request request, HttpResult response) throws IOException;
...
Or I suppose we could omit this class and have the child classes implement ResponseParser
directly?
My reasoning for using InferenceServiceResults
is because parsers like cohere can return different types (TextEmbeddingResults
and TextEmbeddingByteResults
).
} | ||
} | ||
|
||
public static String unsupportedAzureAiStudioProviderErrorMsg(String endpointType, String serviceName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, do we need this function?
return valueOf(name.trim().toUpperCase(Locale.ROOT)); | ||
} | ||
|
||
public static AzureAiStudioProvider fromStringOrStatusException(String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this since it's not used?
} | ||
} | ||
|
||
public static String unsupportedAzureAiStudioProviderErrorMsg(String provider, String serviceName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this since it's not used?
public DefaultSecretSettings getSecretSettings() { | ||
return (DefaultSecretSettings) super.getSecretSettings(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we keep with the visitor pattern and add an accept()
method here similar to azure openai: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/azureopenai/AzureOpenAiModel.java#L59
If we add a new task type, hopefully it'd limit the number of changes to the azure studio service class.
) { | ||
var actionCreator = new AzureAiStudioActionCreator(getSender(), getServiceComponents()); | ||
|
||
if (model instanceof AzureAiStudioEmbeddingsModel embeddingsModel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this above, but the reason I chose the visitor pattern for the other services was to let the model dictate the action that is used. That way we don't need the instanceof
checks here, except one to ensure that it's a AzureAiStudioModel
model with an accept()
method.
Sorry @jonathan-buttner ! :) -- new PR is up here: #108472 |
This PR adds support for Azure AI Studio integration into the Inference API. Currently this supports
text_embedding
andcompletion
task types.Prerequisites to Model Creation
Model Creation:
{tasktype}
types are: [text_embedding
,completion
]Required Service Settings:
api_key
: The API key can be found on your Azure AI Studio deployment's overview pagetarget
: The target URL can be found on your Azure AI Studio deployment's overview pageprovider
: Valid provider types are (case insensitive):openai
- available for embeddings and completionmistral
- available for completion onlymeta
- available for completion onlymicrosoft_phi
- available for completion onlycohere
- available for embeddings and completionsnowflake
- available for completion onlydatabricks
- available for completion onlyendpoint_type
: Valid endpoint types are:token
- a "pay as you go" endpoint (charged by token)realtime
- a realtime endpoint VM deployment (charged by the hour)Embeddings Service Settings
dimensions
: (optional) the number of dimensions the resulting output embeddings should have.Embeddings Task Settings
(this is also overridable in the inference request)
user
: (optional) a string that is a unique identifier representing your end-user. This helps Azure AI Studio in the case of abuse or issues for debugging.Completion Service Settings
(no additional service settings)
Completion Task Settings
(these are all optional and can be overridden in the inference request)
temperature
: What sampling temperature to use, between 0 and 2. Higher values mean the model takes more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. Microsoft recommends altering this or top_p but not both.top_p
: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Microsoft recommends altering this or temperature but not both.do_sample
: request to perform the sampling or notmax_new_tokens
: the maximum number of new tokens the chat completion inference should produce in the outputText Embedding Inference
Chat Completion Inference