Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for clusters with OwnerReferencesPermissionEnforcement admission plugin enabled #773

Open
erikgb opened this issue Jun 15, 2023 · 12 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@erikgb
Copy link

erikgb commented Jun 15, 2023

Bug description

We are performing a Capsule POC on OpenShift, and it turns out that Capsule misses RBAC configuration to support clusters with OwnerReferencesPermissionEnforcement admission plugin enabled. OpenShift enables this by default.

How to reproduce

Pre-requisite: A Kubernetes cluster with OwnerReferencesPermissionEnforcement admission plugin enabled.

Steps to reproduce the behavior:

  1. Establish a new tenant
  2. As a tenant owner, attempt to create a new namespace
  3. The create namespace request is rejected with the following error message: namespaces "erikbo-egb-test" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on

Expected behavior

The tenant owner is allowed to create a new namespace without any errors.

I am pretty sure that the error is caused by the Capsule namespace admission webhook. It adds an owner reference to the namespace, pointing to the tenant resource with blockOwnerDeletion: true. Which a capsule-namespace-provisioner is not allowed to do. I was able to work around the issue by adding a new cluster role with the missing permissions and bind this new cluster role to the capsuleUserGroups.

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: capsule-namespace-provisioner-fix
rules:
  - verbs:
      - update
    apiGroups:
      - capsule.clastix.io
    resources:
      - tenants/finalizers

Logs

If applicable, please provide logs of capsule.

I don't think the logs are relevant in this case.

Additional context

The issue kubernetes-sigs/cluster-api#4880 led me to the actual problem and temporary workaround.

  • Capsule version: latest + @prometherion Flux impersonate hot-fix
  • Helm Chart version: N/A
  • Kubernetes version: Openshift 4.12.13 (K8s v1.25.8+27e744f)
@erikgb erikgb added blocked-needs-validation Issue need triage and validation bug Something isn't working labels Jun 15, 2023
@oliverbaehler oliverbaehler self-assigned this Jun 15, 2023
@oliverbaehler oliverbaehler added this to the v0.3.3 milestone Jun 15, 2023
@oliverbaehler
Copy link
Collaborator

Hi @erikgb, thanks for the issue. I will create a dev version so you can continue with your poc (should have time later today).

May I ask to which version you are refering to with "latest + @prometherion Flux impersonate hot-fix" ?

Are we talking about this: #768?

@erikgb
Copy link
Author

erikgb commented Jun 15, 2023

Hi @oliverbaehler, thanks for responding. I can understand why you ask about the version. 😆 I just realize that this is not really relevant to this issue. I will explain: We are installing Capsule and Capsule Proxy with the latest version of the Helm charts. But because of issues passing through capsule-proxy from a Flux GitOps reconciler, @prometherion built a custom latest version of capsule-proxy after the community meeting on Tuesday. See https://kubernetes.slack.com/archives/C03GETTJQRL/p1686665245918379?thread_ts=1686311373.987009&cid=C03GETTJQRL for exact details.

But we run the latest release of capsule: v0.3.2

@oliverbaehler
Copy link
Collaborator

@erikgb Would you be willing to test it with our fix? Its a bit tricky for me because I don't have an OpenShift laying around.. :D

git clone [email protected]:oliverbaehler/capsule.git
cd capsule & checkout issue/773

make docker-build
...

docker tag docker.io/clastix/capsule:v0.3.1 your.registry.com/capsule:dev
docker push your.registry.com/capsule:dev

Then update the capsule controller to use that image and remove ur temporary clusterRole. Let me know if that works or if you need further assistance.

@erikgb
Copy link
Author

erikgb commented Jun 16, 2023

It's unfortunately a bit of an extra job to build and push an image to our internal registry. Are you able to push the image to your GHCR, or is it maybe already available from there?

But the changes in your PR looks sane to me. It will 99,999999% sure fix the issue. :-D I am more worried about the overall RBAC. It does not appear correct to grant a standard user access to update finalizers on a cluster-scoped resource. But I'll leave that up to you to decide. I would probably not set the owner relationship on admission, but instead make the capsule controller do this - once the namespace is created. Maybe @maxgio92 has an opinion on this?

@maxgio92
Copy link
Collaborator

maxgio92 commented Jun 16, 2023

I think the permission shouldn't be granted as the tenant owner shouldn't be allowed to operate on its Tenant resource, as it should be cluster/platform admin responsibility - from a design and security (prevention) point of view.
TBH I still don't have a strong opinion on how to solve this.

@erikgb
Copy link
Author

erikgb commented Jun 16, 2023

I tend to agree with @maxgio92 that adding RBAC allowing end-users to set finalizers on Tenant resources is wrong. But this is a full blocker for using Capsule on Openshift or any other Kubernetes cluster with this admission plugin enabled. I think we have to look for another way of fixing this issue.

Why must Capsule set Tenant as the Namespace controller? Would it be sufficient to set it as an owner? If we want to clean up (delete) namespaces owned by a tenant when the tenant is deleted, we would have to add a finalizer to tenants. Finalizers are a bit painful, but this could also add more flexibility in whether or not namespaces should be orphaned or not when a tenant is deleted. Related #563

@prometherion
Copy link
Member

prometherion commented Jun 17, 2023

It does not appear correct to grant a standard user access to update finalizers on a cluster-scoped resource.

Couldn't we solve that with the resourceNames key? It grants the required permissions only to those named resources.

That clusterRole could be even dynamically created by Capsule with a specific CLI flag toggle, and automatically used by the Tenant Owners: once a Tenant is created, a manager controller creates and ensure the desired cluster-role.

@maxgio92 WDYT?

@prometherion
Copy link
Member

Why must Capsule set Tenant as the Namespace controller? Would it be sufficient to set it as an owner?

There's a subtle but essential difference between OwnerReference and ControllerReference.

OwnerReference is used for garbage collection, thus the cascading deletion of resources.

ControllerReference adds to the OwnerReference the events propagation of owned resources to the owner.
tl;dr; when a Namespace is changed, the Tenant reconciliation is triggered, ensuring the desired state throughout the Namespaces of the Tenant.

If we want to clean up (delete) namespaces owned by a tenant when the tenant is deleted, we would have to add a finalizer to tenants. Finalizers are a bit painful, but this could also add more flexibility in whether or not namespaces should be orphaned or not when a tenant is deleted. Related #563

I had a similar discussion with a guy at KubeCon EU 22 in Valencia, something as: in a year, cascading deletion will be gone from Capsule, it seems we're in late of a couple of months 🙃

Jokes apart, I would discuss this topic in a different issue to keep simple the resolution of the main root cause, such as overcoming the OwnerReferencesPermissionEnforcement admission controller.

@prometherion
Copy link
Member

Removing this issue from the v0.3.4 milestone since there's no support offered by the reporter and we (as maintainers) don't have the chance to test this functionally with a real OpenShift cluster.

To whoever is interested in this feature or support, please, reach out here to coordinate and lead the development, as well as the tests.

@prometherion prometherion removed this from the v0.3.4 milestone Jul 26, 2023
@prometherion prometherion added good first issue Good for newcomers help wanted Extra attention is needed labels Jul 27, 2023
@erikgb
Copy link
Author

erikgb commented Jul 29, 2023

don't have the chance to test this functionally with a real OpenShift cluster.

I think there is a slight misunderstanding here. I am not asking for full Openshift support, and I understand that open-source projects like Capsule cannot (or will not) test the software on enterprise clusters like Openshift.

This issue is to support any Kubernetes cluster with the Kubernetes OwnerReferencesPermissionEnforcement admission controller plugin enabled. This should be possible on any Kubernetes cluster, and I think it could be justified to enable this on CI e2e-tests in this project.

Openshift (and maybe other Kubernetes distros) enables this admission controller by default. That is the rationale behind this request.

@maxgio92
Copy link
Collaborator

Hi @erikgb @prometherion, sorry for my late response.
I'd keep this issue to track the implementation of the support for the OwnerReferencesPermissionEnforcement admission controller, beyond Openshift.

Unfortunately, I didn't have much time to dig into it, so, I don't have useful details about that. But if you agree, I'd keep this issue open as the feature would provide benefit IMO.

@erikgb
Copy link
Author

erikgb commented Aug 28, 2023

Thanks @maxgio92. We have put our adoption of Capsule on hold because of this. I would also appreciate this not being presented as "an Openshift problem". Capsule does not currently support an optional Kubernetes admission controller. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants