Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make subscriptionID mutable #4612

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dlipovetsky
Copy link
Contributor

@dlipovetsky dlipovetsky commented Mar 1, 2024

What type of PR is this?
/kind bug

What this PR does / why we need it:
Previously, the field was immutable. It was, and remains, an optional
field. The manager bootstrap credential was used when the field was
empty.

The credential is no longer supported, so the field is now effectively
required, because CAPZ fails to reconcile existing resources when the
field is empty.

Making the field mutable makes it possible to update existing resources
to the value previously stored in the manager bootstrap credential.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #4557

Special notes for your reviewer:

  • cherry-pick candidate

Because the subscription ID field became effectively required in v1.11.0, I think we should consider backporting this PR to release-1.11 and release-1.12.

TODOs:

  • squashed commits
  • includes documentation
  • adds updates unit tests

Release note:

The subscriptionID field is now mutable. Previously, resources could have an empty subscription ID. In v1.11.0, the subscription ID became an effectively required field. By making the field mutable, existing resources can be made compatible with v1.11.0 and newer. 

Previously, the field was immutable. It was, and remains, an optional
field. The manager bootstrap credential was used when the field was
empty.

The credential is no longer supported, so the field is now effectively
required, because CAPZ fails to reconcile existing resources when the
field is empty.

Making the field mutable makes it possible to update existing resources
to the value previously stored in the manager bootstrap credential.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 1, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mboersma for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

codecov bot commented Mar 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.72%. Comparing base (49b97b0) to head (7ddaaa9).
Report is 43 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4612      +/-   ##
==========================================
+ Coverage   62.47%   62.72%   +0.24%     
==========================================
  Files         192      192              
  Lines       15453    15643     +190     
==========================================
+ Hits         9655     9812     +157     
- Misses       5131     5147      +16     
- Partials      667      684      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -63,13 +63,6 @@ func (c *AzureCluster) ValidateUpdate(oldRaw runtime.Object) (admission.Warnings
allErrs = append(allErrs, err)
}

if err := webhookutils.ValidateImmutable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than allow mutability, should we simply allow changes from "" to "<non-empty string>"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other option might be to bring back the global config only for sub ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.

That allows the field to be updated at most one time.

What happens if the value is wrong, e.g. by mistake?

Can the user fix the mistake? For example, can they pause the Cluster, remove the AzureCluster resource (removing all finalizers by hand!), fix the subscriptionID, and re-create it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other option might be to bring back the global config only for sub ID.

I think we're discussing whether making this field mutable introduces risk to users.

Let's please remember that the global config is (and always has been) mutable.

So, making the field write-many does not introduce more risk for users.

Moreover, making the field write-once allows users to migrate away from the write-many global config, actually reducing risk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other information can I provide to help us decide between write-once and write-many?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goals are:

  1. Allow users who previously relied on the manager credential bootstrap for the subscription ID to migrate to CAPZ that no longer uses this credential.
  2. Prevent users from inadvertently changing the subscription ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a solution that meets both (1) and (2) perfectly.

  • Write-once is a problem for (1) if the user makes a mistake, but helps with (2).
  • Write-many works for (1), but does not help with (2).

How about something unusual, like "conditional" mutability. For example, if the user wants to change subscriptionID, they have to first set an annotation on AzureCluster. When the annotation is present, subscriptionID is mutable. After the user sets subscriptionID, they remove the annotation. At that point, subscriptionID is again immutable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above idea is in 46bfbaf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have consensus, I'll of course update the docs in this PR.

@@ -88,12 +88,6 @@ func (mcpw *azureManagedControlPlaneTemplateWebhook) ValidateUpdate(ctx context.
if !ok {
return nil, apierrors.NewBadRequest("expected an AzureManagedControlPlaneTemplate")
}
if err := webhookutils.ValidateImmutable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojnhuh what would the practical side-effect be of actually changing from one sub id to another (see my comment above about potentially scoping the change to just "from empty string" changes)?

The sub id is not part of the actual ManagedCluster CRD, so would CAPZ simply create a new AKS cluster in the new sub, assuming that the existing Azure creds are valid w/ equivalent privileges across both subscriptions?

Ref: https://azure.github.io/azure-service-operator/reference/containerservice/v1api20231102preview/#containerservice.azure.com/v1api20231102preview.ManagedCluster_Spec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I suspect would happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, from my experience, CAPZ will fail to reconcile an AzureManagedControlPlane with an empty subscriptionID (although the field is optional in the API!).

capz-controller-manager-667d8ff489-drt8x E0208 00:25:43.520687       1 controller.go:329]  "msg"="Reconciler error" "error"="error creating AzureManagedControlPlane default/dlaksdkp27: failed to reconcile AzureManagedControlPlane service managedcluster: failed to create resource dlaksdkp27/dlaksdkp27 (service: managedcluster): Code=\"LinkedAuthorizationFailed\" Message=\"The client has permission to perform action 'Microsoft.Network/virtualNetworks/subnets/join/action' on scope '/subscriptions/REDACTED/resourceGroups/dlaksdkp27/providers/Microsoft.ContainerService/managedClusters/dlaksdkp27', however the linked subscription 'resourceGroups' was not found. \"" "AzureManagedControlPlane"={"name":"dlaksdkp27","namespace":"default"} "controller"="azuremanagedcontrolplane" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureManagedControlPlane" "name"="dlaksdkp27" "namespace"="default" "reconcileID"="REDACTED"

I thought it would make sense to make subscriptionID mutable everywhere, but if we're concerned about behavior for AzureManagedControlPlane specifically, I think the field could remain immutable, because an empty (or invalid) value never worked to begin with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since AzureManagedControlPlane can not be successfully reconciled when subscriptionID is empty, I don't see a need to make it mutable in that resource. I'll update the PR to make it immutable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e8a948c

@mboersma
Copy link
Contributor

mboersma commented Mar 5, 2024

/test pull-cluster-api-provider-azure-apidiff

If object has the correct annotation, subscriptionID is mutable.
Otherwise, it is immutable.

This allows a user to update the subscriptionID (by adding the
annotation) to migrate to CAPZ that does not use a manager bootstrap
credential, and makes it possible to prevent inadvertent updates to
subscriptionID (by removing the annotation).
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 12, 2024
@dlipovetsky
Copy link
Contributor Author

codecov failure cause:

error - 2024-03-13 15:52:42,507 -- Commit creating failed: {"detail":"Tokenless has reached GitHub rate limit. Please upload using a token: https://docs.codecov.com/docs/adding-the-codecov-token. Expected available in 967 seconds."}

Comment on lines +223 to +233
name: "azurecluster subscription ID is immutable without annotation",
oldCluster: func() *AzureCluster {
ac := createValidCluster()
ac.Spec.SubscriptionID = "212ec1q8"
return ac
}(),
cluster: func() *AzureCluster {
ac := createValidCluster()
ac.Spec.SubscriptionID = "212ec1q9"
return ac
}(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this test did not create a valid cluster, which is bad, because that means it can return a false negative. I changed it to create a valid cluster.

Some other tests in this file have the same issue. I can fix these in a separate PR.

@nojnhuh
Copy link
Contributor

nojnhuh commented Mar 14, 2024

/assign

@nojnhuh
Copy link
Contributor

nojnhuh commented Apr 26, 2024

I opened #4784 to fix the regression this addresses but don't mean for that to supersede this entirely. I'm still open to making changes to the subscriptionID field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: Wait-On-Author
5 participants