add AzureML support (#1171)

## Describe your changes ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. - [ ] Is this PR including examples changes? If yes, please remember to update [example documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md) in a follow-up PR. ## (Optional) Issue link --------- Co-authored-by: Emma <[email protected]> Co-authored-by: jiapli <[email protected]>
microsoft · May 23, 2024 · ea46943 · ea46943
1 parent c679c10
commit ea46943
Show file tree

Hide file tree

Showing 3 changed files with 60 additions and 11 deletions.
diff --git a/examples/phi3/README.md b/examples/phi3/README.md
@@ -1,42 +1,65 @@
 # Phi3 optimization with Olive
-This folder contains an example of optimizing [the Phi-3-Mini-4K-Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) model in HF for different hardware targets with Olive.
+This folder contains an example of optimizing the Phi-3-Mini-4K-Instruct model from [Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) or [Azure Machine Learning Model Catalog](https://ai.azure.com/explore/models/Phi-3-mini-4k-instruct/version/7/registry/azureml?tid=72f988bf-86f1-41af-91ab-2d7cd011db47) for different hardware targets with Olive.
 
 
 ## Prerequisites
+Install the dependencies
+```
+pip install -r requirements.txt
+```
 * einops
 * Pytorch: >=2.2.0 \
  _The [official website](https://pytorch.org/) offers packages compatible with CUDA 11.8 and 12.1. Please select the appropriate version according to your needs._
 * [Package onnxruntime](https://onnxruntime.ai/docs/install/#inference-install-table-for-all-languages): >=1.18.0
-* [Package onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai): >=0.2.0. If you target GPU, pls install onnxruntime and onnxruntime-genai gpu packages.
+* [Package onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai): >=0.2.0.
 
-Install the dependencies
+If you target GPU, pls install onnxruntime and onnxruntime-genai gpu packages.
+
+
+### For optimizing model from Hugging Face
+if you have not loged in Hugging Face account,
+- Install Hugging Face CLI and login your Hugging Face account for model access
 ```
-pip install -r requirements.txt
+huggingface-cli login
 ```
 
+### For optimizing model from Azure Machine Learning Model Catalog
+
+- Install Olive with Azure Machine Learining dependency
+```
+pip install olive-ai[azureml]
+```
+if you have not loged in Azure account,
+- Install Azure Command-Line Interface (CLI) following [this link](https://learn.microsoft.com/en-us/cli/azure/)
+- Run `az login` to login your Azure account to allows Olive to access the model.
+
+
 ## Usage
 we will use the `phi3.py` script to generate optimized model for a chosen hardware target by running the following commands.
 
 ```
-python phi3.py [--target HARDWARE_TARGET] [--precision DATA_TYPE] [--inference] [--prompt PROMPT] [--max_length LENGTH]
+python phi3.py [--target HARDWARE_TARGET] [--precision DATA_TYPE] [--source SOURCE] [--inference] [--prompt PROMPT] [--max_length LENGTH]
 
 # Examples
-python phi3.py --target web
+python phi3.py --target mobile
+
+python phi3.py --target model --source AzureML
 
 python phi3.py --target mobile --inference --prompt "Write a story starting with once upon a time" --max_length 200
 ```
 
 - `--target`: cpu, cuda, mobile, web
-- `--precision`: optional. fp32, fp16, int4. fp32 or int4(default) for cpu target; fp32 or fp16 or int4(default) for gpu target; int4(default) for mobile or web
-- `--inference`: run the optimized model, for non-web models inference.
+- `--precision`: optional, for data precision. fp32 or int4 (default) for cpu target; fp32, fp16, or int4 (default) for GPU target; int4 (default) for mobile or web.
+- `--source`: optional, for model path. HF or AzureML. HF(Hugging Face model) by default.
+- `--inference`: optional, for non-web models inference/validation.
 - `--prompt`: optional, the prompt text fed into the model. Take effect only when `--inference` is set.
 - `--max_length`: optional, the max length of the output from the model. Take effect only when `--inference` is set.
 
 
 This script includes
-1. Generate the Olive configuration file for your need including the chosen HW target, the preferred model precision.
-2. Generate optimized model with Olive based on the configuration file for the chosen HW target
-3. (optional) Inference the optimized model with ONNX Runtime Generation API. Not supported for web target
+- Generate the Olive configuration file for the chosen HW target
+- Generate optimized model with Olive based on the configuration file for the chosen HW target
+- (optional) Inference the optimized model with ONNX Runtime Generate() API with non-web target
 
 
 If you have an Olive configuration file, you can also run the olive command for model generation:
@@ -46,3 +69,8 @@ olive run [--config CONFIGURATION_FILE]
 # Examples
 olive run --config phi3_mobile_int4.json
 ```
+
+## More Inference Examples
+- [Android chat APP with Phi-3 and ONNX Runtime Mobile](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/mobile/examples/phi-3/android)
+
+- [Web chat APP with Phi-3 and ONNX Runtime Web](https://github.com/microsoft/onnxruntime-inference-examples/tree/gs/chat/js/chat)
diff --git a/examples/phi3/phi3.py b/examples/phi3/phi3.py
@@ -24,6 +24,14 @@
  "web": "JsExecutionProvider",
 }
 
+AML_MODEL_Path = {
+ "model_path": {
+ "type": "azureml_registry_model",
+ "config": {"registry_name": "azureml", "name": "Phi-3-mini-4k-instruct", "version": "7"},
+ },
+ "model_file_format": "PyTorch.MLflow",
+}
+
 
 def get_args(raw_args):
  parser = argparse.ArgumentParser(description="phi3 optimization")
@@ -62,6 +70,13 @@ def get_args(raw_args):
  default=200,
  help="Max length for generation. Not supported with Web target.",
  )
+ parser.add_argument(
+ "--source",
+ type=str,
+ default="HF",
+ choices=["HF", "AzureML"],
+ help="Choose from HF(default), AzureML",
+ )
 
  return parser.parse_args(raw_args)
 
@@ -105,6 +120,9 @@ def generate_config(args):
  with open(json_file_template) as f:
  template_json = json.load(f)
 
+ if args.source == "AzureML":
+ template_json["input_model"]["config"] = AML_MODEL_Path
+
  target = str(args.target)
  device = "GPU" if target in ("cuda", "web") else "CPU"
  execution_providers = [TARGET_TO_EP[target.lower()]]

diff --git a/examples/phi3/requirements.txt b/examples/phi3/requirements.txt
@@ -1,5 +1,8 @@
 einops
+olive-ai>=0.6.0
 onnx>=1.15.0
+onnxruntime>=1.18.0
+onnxruntime-genai>=0.2.0
 onnxscript>=0.1.0.dev20240126
 torch>=2.2.0
 transformers>=4.36.2