Skip to content

Commit

Permalink
add AzureML support (#1171)
Browse files Browse the repository at this point in the history
## Describe your changes

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link

---------

Co-authored-by: Emma <[email protected]>
Co-authored-by: jiapli <[email protected]>
  • Loading branch information
3 people committed May 23, 2024
1 parent c679c10 commit ea46943
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 11 deletions.
50 changes: 39 additions & 11 deletions examples/phi3/README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,65 @@
# Phi3 optimization with Olive
This folder contains an example of optimizing [the Phi-3-Mini-4K-Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) model in HF for different hardware targets with Olive.
This folder contains an example of optimizing the Phi-3-Mini-4K-Instruct model from [Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) or [Azure Machine Learning Model Catalog](https://ai.azure.com/explore/models/Phi-3-mini-4k-instruct/version/7/registry/azureml?tid=72f988bf-86f1-41af-91ab-2d7cd011db47) for different hardware targets with Olive.


## Prerequisites
Install the dependencies
```
pip install -r requirements.txt
```
* einops
* Pytorch: >=2.2.0 \
_The [official website](https://pytorch.org/) offers packages compatible with CUDA 11.8 and 12.1. Please select the appropriate version according to your needs._
* [Package onnxruntime](https://onnxruntime.ai/docs/install/#inference-install-table-for-all-languages): >=1.18.0
* [Package onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai): >=0.2.0. If you target GPU, pls install onnxruntime and onnxruntime-genai gpu packages.
* [Package onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai): >=0.2.0.

Install the dependencies
If you target GPU, pls install onnxruntime and onnxruntime-genai gpu packages.


### For optimizing model from Hugging Face
if you have not loged in Hugging Face account,
- Install Hugging Face CLI and login your Hugging Face account for model access
```
pip install -r requirements.txt
huggingface-cli login
```

### For optimizing model from Azure Machine Learning Model Catalog

- Install Olive with Azure Machine Learining dependency
```
pip install olive-ai[azureml]
```
if you have not loged in Azure account,
- Install Azure Command-Line Interface (CLI) following [this link](https://learn.microsoft.com/en-us/cli/azure/)
- Run `az login` to login your Azure account to allows Olive to access the model.


## Usage
we will use the `phi3.py` script to generate optimized model for a chosen hardware target by running the following commands.

```
python phi3.py [--target HARDWARE_TARGET] [--precision DATA_TYPE] [--inference] [--prompt PROMPT] [--max_length LENGTH]
python phi3.py [--target HARDWARE_TARGET] [--precision DATA_TYPE] [--source SOURCE] [--inference] [--prompt PROMPT] [--max_length LENGTH]
# Examples
python phi3.py --target web
python phi3.py --target mobile
python phi3.py --target model --source AzureML
python phi3.py --target mobile --inference --prompt "Write a story starting with once upon a time" --max_length 200
```

- `--target`: cpu, cuda, mobile, web
- `--precision`: optional. fp32, fp16, int4. fp32 or int4(default) for cpu target; fp32 or fp16 or int4(default) for gpu target; int4(default) for mobile or web
- `--inference`: run the optimized model, for non-web models inference.
- `--precision`: optional, for data precision. fp32 or int4 (default) for cpu target; fp32, fp16, or int4 (default) for GPU target; int4 (default) for mobile or web.
- `--source`: optional, for model path. HF or AzureML. HF(Hugging Face model) by default.
- `--inference`: optional, for non-web models inference/validation.
- `--prompt`: optional, the prompt text fed into the model. Take effect only when `--inference` is set.
- `--max_length`: optional, the max length of the output from the model. Take effect only when `--inference` is set.


This script includes
1. Generate the Olive configuration file for your need including the chosen HW target, the preferred model precision.
2. Generate optimized model with Olive based on the configuration file for the chosen HW target
3. (optional) Inference the optimized model with ONNX Runtime Generation API. Not supported for web target
- Generate the Olive configuration file for the chosen HW target
- Generate optimized model with Olive based on the configuration file for the chosen HW target
- (optional) Inference the optimized model with ONNX Runtime Generate() API with non-web target


If you have an Olive configuration file, you can also run the olive command for model generation:
Expand All @@ -46,3 +69,8 @@ olive run [--config CONFIGURATION_FILE]
# Examples
olive run --config phi3_mobile_int4.json
```

## More Inference Examples
- [Android chat APP with Phi-3 and ONNX Runtime Mobile](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/mobile/examples/phi-3/android)

- [Web chat APP with Phi-3 and ONNX Runtime Web](https://github.com/microsoft/onnxruntime-inference-examples/tree/gs/chat/js/chat)
18 changes: 18 additions & 0 deletions examples/phi3/phi3.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@
"web": "JsExecutionProvider",
}

AML_MODEL_Path = {
"model_path": {
"type": "azureml_registry_model",
"config": {"registry_name": "azureml", "name": "Phi-3-mini-4k-instruct", "version": "7"},
},
"model_file_format": "PyTorch.MLflow",
}


def get_args(raw_args):
parser = argparse.ArgumentParser(description="phi3 optimization")
Expand Down Expand Up @@ -62,6 +70,13 @@ def get_args(raw_args):
default=200,
help="Max length for generation. Not supported with Web target.",
)
parser.add_argument(
"--source",
type=str,
default="HF",
choices=["HF", "AzureML"],
help="Choose from HF(default), AzureML",
)

return parser.parse_args(raw_args)

Expand Down Expand Up @@ -105,6 +120,9 @@ def generate_config(args):
with open(json_file_template) as f:
template_json = json.load(f)

if args.source == "AzureML":
template_json["input_model"]["config"] = AML_MODEL_Path

target = str(args.target)
device = "GPU" if target in ("cuda", "web") else "CPU"
execution_providers = [TARGET_TO_EP[target.lower()]]
Expand Down
3 changes: 3 additions & 0 deletions examples/phi3/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
einops
olive-ai>=0.6.0
onnx>=1.15.0
onnxruntime>=1.18.0
onnxruntime-genai>=0.2.0
onnxscript>=0.1.0.dev20240126
torch>=2.2.0
transformers>=4.36.2

0 comments on commit ea46943

Please sign in to comment.