Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RELEASE NOTE: ensure MutatingWebhookConfiguration is deleted on upgrade #1927

Open
huxcrux opened this issue Mar 5, 2024 · 7 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@huxcrux
Copy link
Contributor

huxcrux commented Mar 5, 2024

/kind bug

What steps did you take and what happened:
It seems like #1920 broke the defaulting webhook for openstackcluster in v1beta1.

Error during apply:

[2024-03-05T18:41:23+0100]: {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'Internal error occurred: failed calling webhook "default.openstackcluster.infrastructure.cluster.x-k8s.io": failed to call webhook: the server could not find the requested resource', 'reason': 'InternalError', 'details': {'causes': [{'message': 'failed calling webhook "default.openstackcluster.infrastructure.cluster.x-k8s.io": failed to call webhook: the server could not find the requested resource'}]}, 'code': 500}

I suspect there might need to be some changes in webhook generation since it seems to be removed completely when checking the following file: e6bb34f#diff-369b61dd1f2f1f60722927fda70e5a51e130cea68336e6d536086b522ffef0c6

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): main (current)
  • Cluster-API version:
  • OpenStack version:
  • Minikube/KIND version:
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 5, 2024
@mdbooth
Copy link
Contributor

mdbooth commented Mar 5, 2024

How are you triggering this?

@mdbooth
Copy link
Contributor

mdbooth commented Mar 5, 2024

I deliberately removed the defaulting webhooks in #1920 because none of them had an implementation.

Ah... I'll bet I know what happened. Note that #1920 removes the MutatingWebhookConfiguration from the manifests which references the defaulting webhook. However, if you're upgrading and you just apply the new manifests there's going to be nothing to remove it.

I'm not sure how to handle that without an operator, tbh. Definitely needs a release note!

To start with, please can you confirm that deleting CAPO's MutatingWebhookConfiguration fixes the problem. If we can confirm that we can think about how to automate it, or at least ensure folks know about it.

@huxcrux
Copy link
Contributor Author

huxcrux commented Mar 6, 2024

I deliberately removed the defaulting webhooks in #1920 because none of them had an implementation.

Ah... I'll bet I know what happened. Note that #1920 removes the MutatingWebhookConfiguration from the manifests which references the defaulting webhook. However, if you're upgrading and you just apply the new manifests there's going to be nothing to remove it.

I'm not sure how to handle that without an operator, tbh. Definitely needs a release note!

To start with, please can you confirm that deleting CAPO's MutatingWebhookConfiguration fixes the problem. If we can confirm that we can think about how to automate it, or at least ensure folks know about it.

If I remove the mutateing webhook it solves the problem. Also installing a new management cluster directly from main "fixes" the problem since the webhook is never created meaning this is just a problem when upgrading CAPO to main ATM

@lentzi90
Copy link
Contributor

lentzi90 commented Mar 6, 2024

It seems that clusterctl upgrade is smart about this and that is why the upgrade test was successful. That means it will "only" be an issue when using kubectl or other tools instead of clusterctl.

$ k get mutatingwebhookconfigurations.admissionregistration.k8s.io 
NAME                                                        WEBHOOKS   AGE
capi-kubeadm-bootstrap-mutating-webhook-configuration       2          14s
capi-kubeadm-control-plane-mutating-webhook-configuration   2          14s
capi-mutating-webhook-configuration                         9          14s
capo-mutating-webhook-configuration                         3          13s
cert-manager-webhook                                        1          36s
$ clusterctl upgrade apply --contract=v1beta1 --config=clusterctl.yaml
Checking cert-manager version...
Cert-manager is already up to date
Performing upgrade...
Scaling down Provider="infrastructure-openstack" Version="v0.9.0" Namespace="capo-system"
Deleting Provider="infrastructure-openstack" Version="v0.9.0" Namespace="capo-system"
Installing Provider="infrastructure-openstack" Version="v0.9.99" TargetNamespace="capo-system"

sigs.k8s.io/cluster-api
$ k get mutatingwebhookconfigurations.admissionregistration.k8s.io 
NAME                                                        WEBHOOKS   AGE
capi-kubeadm-bootstrap-mutating-webhook-configuration       2          73s
capi-kubeadm-control-plane-mutating-webhook-configuration   2          73s
capi-mutating-webhook-configuration                         9          73s
cert-manager-webhook                                        1          95s

@huxcrux
Copy link
Contributor Author

huxcrux commented Mar 6, 2024

It seems that clusterctl upgrade is smart about this and that is why the upgrade test was successful. That means it will "only" be an issue when using kubectl or other tools instead of clusterctl.

$ k get mutatingwebhookconfigurations.admissionregistration.k8s.io 
NAME                                                        WEBHOOKS   AGE
capi-kubeadm-bootstrap-mutating-webhook-configuration       2          14s
capi-kubeadm-control-plane-mutating-webhook-configuration   2          14s
capi-mutating-webhook-configuration                         9          14s
capo-mutating-webhook-configuration                         3          13s
cert-manager-webhook                                        1          36s
$ clusterctl upgrade apply --contract=v1beta1 --config=clusterctl.yaml
Checking cert-manager version...
Cert-manager is already up to date
Performing upgrade...
Scaling down Provider="infrastructure-openstack" Version="v0.9.0" Namespace="capo-system"
Deleting Provider="infrastructure-openstack" Version="v0.9.0" Namespace="capo-system"
Installing Provider="infrastructure-openstack" Version="v0.9.99" TargetNamespace="capo-system"

sigs.k8s.io/cluster-api
$ k get mutatingwebhookconfigurations.admissionregistration.k8s.io 
NAME                                                        WEBHOOKS   AGE
capi-kubeadm-bootstrap-mutating-webhook-configuration       2          73s
capi-kubeadm-control-plane-mutating-webhook-configuration   2          73s
capi-mutating-webhook-configuration                         9          73s
cert-manager-webhook                                        1          95s

This indeed is a problem when just "applying" the new manifests. clusterctl upgrade seems to get around this by just removing resources and then add the new ones. We should probably add something in the releasenotes since this is breaking for anyone not using clsuterctl to upgrade CAPO.

For reference we have discussed this on slack where more details can be found: https://kubernetes.slack.com/archives/C8TSNPY4T/p1709672361982489

@mdbooth
Copy link
Contributor

mdbooth commented Mar 6, 2024

So this is 'not a bug', but it's a serious 'not a bug' so lets keep this open until we've ensured there is a release note about it.

@jichenjc I'm again thinking it would be really nice to have a workflow from GitHub which results in a Release Note. I think CPO has this. Any idea how it works?

@jichenjc
Copy link
Contributor

jichenjc commented Mar 7, 2024

@mdbooth mdbooth changed the title defaulting webhook for openstackcluster broken after #1920 RELEASE NOTE: ensure MutatingWebhookConfiguration is deleted on upgrade Mar 19, 2024
@mdbooth mdbooth modified the milestones: v0.8.0, v0.10.0 Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: Inbox
Development

No branches or pull requests

5 participants