Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"az ml workspace create" fails to create Application Insights and Container Registry #28980

Closed
Rubikalubi opened this issue May 16, 2024 · 15 comments
Assignees
Labels
Auto-Assign Auto assign by bot customer-reported Issues that are reported by GitHub users external to the Azure organization. Machine Learning az ml question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team.

Comments

@Rubikalubi
Copy link

Rubikalubi commented May 16, 2024

Describe the bug

When trying to create a new ml workspace, our deployment pipeline on devops fails because the workspace create command only creates the Storage Account and Keyvault. Then it tries to create the workspace, which obviously fails, since no Container Registry and Application Insights exists. We always update the CLI and ml extension to the latest version in our pipeline before executing any commands. I ran the same Code with ml extension 2.24.0 two months ago and it worked without an issue.
Also running on my own machine with ml extension 2.22.0 the behaviour is as expected.

Related command

az ml workspace create

Errors

The deployment request ml*** was accepted. ARM deployment URI for reference:
URL REMOVED
Creating Storage Account: (ml***** ) ... Done (21s)
Creating Key Vault: (ml***** ) Done (18s)
ERROR: Code: ValidationError
Message: Missing dependent resources in workspace json
Target: workspace
Exception Details: (Invalid) Missing dependent resources in workspace json
Code: Invalid
Message: Missing dependent resources in workspace json
Target: workspace

##[error]Script failed with exit code: 1
/usr/bin/az account clear

This is the deployment on Azure Portal.

image

Issue script & Debug output

az ml workspace create --resource-group "${{ parameters.resourceGroup }}" --file $(Build.SourcesDirectory)/workspace/workspace.json

Contents of workspace.json

{
"$schema": "https://azuremlschemas.azureedge.net/latest/workspace.schema.json",
"name": "ml***",
"display_name": "ml***",
"description": "created workspace ml*** in resource groupe Res_Grp_*** on 2024-05-16T09:44:05",
"tags": {
"createdOn": "2024-05-16T09:44:05",
"createdBy": ""
},
"location": "westeurope",
"resource_group": "Res_Grp_
",
"hbi_workspace": false,
}

I did not run the script with --debug because i would have to delete the ressources manually afterwards. If desired, please let me know.

Expected behavior

The command create all required ressources for a ml workspace (storage account, keyvault, container registry, appinsights) and then creates the workspace.

Environment Summary

/usr/bin/az --version
azure-cli 2.60.0

core 2.60.0
telemetry 1.1.0

Extensions:
azure-devops 1.0.0
ml 2.26.0

Dependencies:
msal 1.28.0
azure-mgmt-resource 23.1.0b2

Additional context

No response

@Rubikalubi Rubikalubi added the bug This issue requires a change to an existing behavior in the product in order to be resolved. label May 16, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. Auto-Assign Auto assign by bot ARM az resource/group/lock/tag/deployment/policy/managementapp/account management-group labels May 16, 2024
@yonzhan
Copy link
Collaborator

yonzhan commented May 16, 2024

Thank you for opening this issue, we will look into it.

@microsoft-github-policy-service microsoft-github-policy-service bot added Azure CLI Team The command of the issue is owned by Azure CLI team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Machine Learning az ml labels May 16, 2024
Copy link
Contributor

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github.

@yonzhan yonzhan removed bug This issue requires a change to an existing behavior in the product in order to be resolved. Azure CLI Team The command of the issue is owned by Azure CLI team labels May 16, 2024
@xec-abailey
Copy link

xec-abailey commented May 21, 2024

There is something fishy here guys, I am experiencing this issue as well, just as @Rubikalubi is. Previously it would create any resource automatically if it isn't specified.

However, what we ended up trying to do is modify our creation script to first create an application insights resource (we already have a container registry, storage account, and key vault so it's not included in the creation script below) and then associate it using the following script:

acr_arm_id=$(az acr show --name ***** --query id -o tsv) 
insights_arm_id=$(az monitor app-insights component show --app ***** -g ***** --query "id" -o tsv)
az ml workspace create --resource-group **** --name ****** --storage-account ****  --key-vault ****** --container-registry "$acr_arm_id" -a "$insights_arm_id"

Considering the docs here I would expect this to work, however we are greeted with this

Code: ValidationError
Message: AppInsights ID is not in right format
Target: properties
Exception Details:      (Invalid) AppInsights ID is not in right format
        Code: Invalid
        Message: AppInsights ID is not in right format
        Target: properties

Looking at the AppInsights ID that is returned from the command directly we see it is compliant, or at least appears to be so given the above docs:

/subscriptions/*subscription*/resourceGroups/*resource-group*/providers/microsoft.insights/components/*name*

The name in this case is compliant with this doc so that shouldn't be an issue.

Version info

azure-cli                         2.61.0

core                              2.61.0
telemetry                          1.1.0

Extensions:
application-insights               1.2.1
k8s-extension                      1.6.1
ml                                2.26.0

Dependencies:
msal                              1.28.0
azure-mgmt-resource               23.1.1

@PierceLovesee
Copy link

+1 to @xec-abailey 's write up; I am experiencing a very similar issue.

@kimzed
Copy link

kimzed commented May 22, 2024

I am facing the same issue. I tried to create a managed online feature store from the SDK using the tutorial. I tried a lot of different things and checked the template and different deployments. The issue seems to have happened recently because a colleague could run the script without problem but now we do not manage to create it

@PierceLovesee
Copy link

PierceLovesee commented May 22, 2024

This breaking change / bug was definitely introduced in the version release of the Azure ML Extension when upgraded from 2.25 to 2.26. Downgrading and locking to ml v2.25 resolved the undesirable behavior in our application.

@janmolemans
Copy link

We have the same issue. I noticed that the ARM template resulting from the az cli command has the following issue, namely that containerregistry variable is used for applicationinsights:
'applicationInsights': '[if(not(equals(parameters('applicationInsightsOption'), 'none')), variables('containerRegistry'), json('null'))]',
'containerRegistry': '[if(not(equals(parameters('containerRegistryOption'), 'none')), variables('containerRegistry'), json('null'))]',

@yonzhan yonzhan removed the ARM az resource/group/lock/tag/deployment/policy/managementapp/account management-group label May 22, 2024
@achauhan-scc
Copy link
Member

Thanks for reporting the issue and providing the details. I am raising a PR to mitigate the issue.

@mbizo1
Copy link

mbizo1 commented May 23, 2024

I a, experiencing the same issue as of today,

@meet47
Copy link

meet47 commented May 23, 2024

Hoping to get it resolve soon.

@ShawnLiu119
Copy link

facing the same iossue htere

@endre-kosa
Copy link

This breaking change / bug was definitely introduced in the version release of the Azure ML Extension when upgraded from 2.25 to 2.26. Downgrading and locking to ml v2.25 resolved the undesirable behavior in our application.

When I try this solution I get the following messages:
'Default enabled including preview versions for extension installation now. Disabled in future release. Use '--allow-preview true' to enable it specifically if needed. Use '--allow-preview false' to install stable version only.' or
'Extension 'ml' 2.26.0 is already installed.'

@madiepev
Copy link

Thank you for reporting this issue. It seems indeed related to the Cloud Shell and not to the .setup.sh. We're investigating the issue to explore any fixes/workaround and will update when we have something. For now, manual creation of resources seems the only fix as the ML extension can't be updated/downgraded in Cloud Shell.

@achauhan-scc
Copy link
Member

ml extension 2.26.1 is released with the fix.

@Underdoge
Copy link

Hello @achauhan-scc,

I'm getting the following error when running "az extension update --name ml" in Cloud Shell:

Default enabled including preview versions for extension installation now. Disabled in future release. Use '--allow-preview true' to enable it specifically if needed. Use '--allow-preview false' to install stable version only. 
Cannot update system extension ml, please wait until Cloud Shell updates it in the next release.

And when trying "az extension update --name ml --allow-preview true" I get:

Cannot update system extension ml, please wait until Cloud Shell updates it in the next release.

I'm stuck on 2.26.0:

az --version
azure-cli                         2.61.0

core                              2.61.0
telemetry                          1.1.0

Extensions:
ai-examples                        0.2.5
ml                                2.26.0
ssh                                2.0.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Auto-Assign Auto assign by bot customer-reported Issues that are reported by GitHub users external to the Azure organization. Machine Learning az ml question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests