✨ proposal: new API for managed Security Groups #1756

EmilienM · 2023-11-22T22:03:32Z

What this PR does / why we need it:

KEP to enhance Security Group API.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Related #1752

netlify · 2023-11-22T22:03:37Z

✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name	Link
🔨 Latest commit	`3992c24`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/65dc8fdba06ca3000843cf27
😎 Deploy Preview	https://deploy-preview-1756--kubernetes-sigs-cluster-api-openstack.netlify.app/print
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

k8s-ci-robot · 2023-11-22T22:03:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: EmilienM
Once this PR has been reviewed and has the lgtm label, please assign mdbooth for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

EmilienM · 2023-11-22T22:04:57Z

/cc mdbooth

EmilienM · 2023-11-22T22:11:45Z

/cc jichenjc

docs/proposals/20231122-security-groups.md

jichenjc · 2023-11-27T02:10:16Z

docs/proposals/20231122-security-groups.md

+ namespace: default
+ spec:
+ securityGroupsSpec:
+ controlPlaneSecurityGroupRules:


previously we talked about whether we should create a CRD and reusable in other clusters
e.g you create multiple clusters and if modify one json file or yaml file then it can be used in multiple clusters?

I know it's complicated so I think we can defer to next stage to add such extension?

For now, let's try without a new CRD and see how it goes. I'll keep this comment open as it's still relevant in the last KEP's iteration.

lentzi90

I like the overall design of this

docs/proposals/20231122-security-groups.md

mdbooth · 2023-11-27T13:13:33Z

docs/proposals/20231122-security-groups.md

+2. Avoid causing and breaking changes to previous users, they'll still be able to use the legacy rules
+3. Successfully be able to create and manage security groups for both the Control Plane and the Worker Nodes
+4. Provide a migration path (documentation or code) for users who are currently using the legacy rules provided by `OpenStackCluster.spec.managedSecurityGroups`
+5. Deprecate `OpenStackCluster.spec.apiServerLoadBalancer.additionalPorts` as we'll have an API to create Control Plane Security Group rules (with more flexibility than we have now). Conversion will happen between this parameter and the one for Control Plane Security Group rules, in order to maintain backward compatibility


We can't do this. This parameters has more uses than just security group rules and it would break existing use cases. See #1749 (comment)

However, I would like to separate responsibility for these rules into:

A separate security group

Managed by a separate Octavia control plane endpoint provider

i.e. Create a new controller for managing an Octavia API loadbalancer. That controller is responsible for creating a security group suitable for its needs and attaching that security group to control plane machines.

In this case, it would simply be the responsibility of the cluster controller to remove the legacy rules from the default security group.

docs/proposals/20231122-security-groups.md

EmilienM · 2023-11-28T16:14:31Z

@mkjpryor I'll like your feedback on this one when time permits for you. Thanks!

dulek

There is an important design decision to be made here - how do we deal with users messing with SGs managed by CAPO through OpenStack API.

My idea would be to adopt K8s model here. The controller should periodically monitor SG rules of the SGs it's managing and SG associations and reconcile them if user made any changes. I think this is the only sane way to do it, otherwise we allow users to modify SGs on their own and we never know if the SG that's attached is to be kept or removed during e.g. upgrade.

dulek · 2023-11-28T16:33:03Z

docs/proposals/20231122-security-groups.md

+ # Allow to provide rules for the CNI that will be applied to all nodes
+ cniSecurityGroupRules:
+ - direction: ingress
+ ethertype: IPv4
+ portRangeMin: 1234
+ portRangeMax: 1234
+ protocol: tcp


Why do we call this one CNI, when it's basically a thing gathering rules added to each of the nodes? I'd abstain from designing this against a CNI. From the technical standpoint I think it's possible to use different CNIs on different nodes. It's totally not supported at the moment, but might be one day.

I can name it allNodesSecurityGroupRules, I don't mind.

mkjpryor · 2023-11-28T17:41:57Z

@EmilienM

Do you think that this should be consistent with whatever we do for server groups? Personally, I favour new CRDs for both, i.e. OpenStackSecurityGroup and OpenStackServerGroup CRDs.

I worry about us cluttering up the spec of the OpenStackCluster with too much complicated stuff like this. IMHO we should start to establish a pattern of having additional CRDs for resources that don't have a direct correspondence with a cluster/machine.

For example, a port is something that I don't mind having in OpenStackMachine.spec, as it is clearly related to the machine.

EmilienM · 2023-11-29T01:23:43Z

@EmilienM

Do you think that this should be consistent with whatever we do for server groups? Personally, I favour new CRDs for both, i.e. OpenStackSecurityGroup and OpenStackServerGroup CRDs.

I worry about us cluttering up the spec of the OpenStackCluster with too much complicated stuff like this. IMHO we should start to establish a pattern of having additional CRDs for resources that don't have a direct correspondence with a cluster/machine.

For example, a port is something that I don't mind having in OpenStackMachine.spec, as it is clearly related to the machine.

To me there is a relation between Security Groups and Machines, as we want to update a Machine if a security group is added to the control plane machines.
Example given: if we move the Calico rules in their own Security Group, we'll need a migration path where we remove the rules from the control plane security group and then create a new one with Calico rules and add that group to the machines. This require the controller to update the machine with the new SG.
Do we want to do that in the OpenStackMachine (or Cluster) controller or via a new controller dedicated to security groups but that can also update a Machine?

EmilienM · 2023-11-29T01:25:02Z

There is an important design decision to be made here - how do we deal with users messing with SGs managed by CAPO through OpenStack API.

My idea would be to adopt K8s model here. The controller should periodically monitor SG rules of the SGs it's managing and SG associations and reconcile them if user made any changes. I think this is the only sane way to do it, otherwise we allow users to modify SGs on their own and we never know if the SG that's attached is to be kept or removed during e.g. upgrade.

I think we all agree; the question remains whether we have a new CRD (+ controller) dedicated to Security Groups lifecycle or continue to use OpenStackCluster.

mkjpryor · 2023-11-29T11:10:57Z

It just feels odd to me that all the security groups are defined in the OpenStackCluster object and the OpenStackMachine object says "I want to use security group X from the cluster object", vs having a new OpenStackSecurityGroup CRD 🤷‍♂️ Same for server groups when we do that thing.

I do agree that it probably makes the conversion more complicated.

Happy to be overruled though.

jichenjc · 2023-12-01T01:48:28Z

It just feels odd to me that all the security groups are defined in the OpenStackCluster object and the OpenStackMachine object says "I want to use security group X from the cluster object", vs having a new OpenStackSecurityGroup CRD 🤷‍♂️ Same for server groups when we do that thing.

that's exactly what I asked above, and I Think maybe we can check comments from @EmilienM

For now, let's try without a new CRD and see how it goes. I'll keep this comment open as it's still relevant in the last KEP's iteration.

so maybe we can check https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1765/files first

mdbooth · 2024-01-15T17:39:28Z

As discussed at the last office hours, I think this is the biggest single API stabilisation task we have for v1beta1. Specifically, we need to remove the hard-coded Calico rules, but allow:

CNI rules to be used in managed security groups
Safely upgrading clusters which are currently using the hard-coded Calico rules

I would very much like to get to v1beta1 asap, so to make that simpler I'd like to propose reducing the scope of the change here as much as possible.

managedSecurityGroups:
  # Enable the management of security groups with the default rules (kubelet, etcd, etc.)
  # The default stays false
  enabled: true
  # Allow to extend the default security group rules for the Bastion,
  # the Control Plane and the Worker security groups
  additionalBastionSecurityGroupRules:
    - direction: ingress
      ethertype: IPv4
      portRangeMin: 1234
      portRangeMax: 1234
      protocol: tcp
  additionalControlPlaneSecurityGroupRules:
    - direction: ingress
      ethertype: IPv4
      portRangeMin: 1234
      portRangeMax: 1234
      protocol: tcp
  additionalWorkerSecurityGroupRules:
    - direction: ingress
      ethertype: IPv4
      portRangeMin: 1234
      portRangeMax:1234
      protocol: tcp
  # Allow to provide rules that will be applied to all nodes
  allNodesSecurityGroupRules:
    - direction: ingress
      ethertype: IPv4
      portRangeMin: 1234
      portRangeMax: 1234
      protocol: tcp

I would remove additionalSecurityGroups for now for 2 reasons. Firstly, in this proposal they are nested under managedSecurityGroups, but they're not managed. I think it would be confusing to write:

managedSecurityGroups:
  enabled: false
  additionalSecurityGroups:
  - ...

Does the user expect these to be created when managed security groups is disabled? additionalSecurityGroups are unmanaged, so I'd have them separate.

Secondly, adding a new additionalSecurityGroups field would be a backwards compatible change, so we can do it in v1beta1. Lets punt on anything we don't strictly need right now.

We discussed having a useLegacyCalicoRules field, but it occurs to me we don't need it: we can just add the Calico rules directly to allNodesSecurityGroupRules during upgrade. This means that the legacy cruft only lives in the legacy API code. We can document what the rules are, and others can add to the documentation the required rules for other CNIs.

In the first instance, we can also remove additional controller complexity by not changing the set of security groups attached to nodes. This means we don't have an upgrade problem, and don't need to be able to dynamically attach and detach security groups. Instead, at least in the first instance, we can simply replicate rules in allNodesSecurityGroupRules to control-plane and workers.

Lastly, we make extensive use of RemoteGroupID in our current rules. We need to allow that to be expressed in the new API, but it obviously can't take the UUID of a managed security group. I suggest that in the first instance we add a remoteManagedGroups field to security group rules which references one of bastion, control-plane, or worker. As a fully worked example, I would expect an upgraded configuration containing upgraded Calico rules to be:

managedSecurityGroups:
  enabled: true
  allNodesSecurityGroupRules:
  - description: BGP (calico)
    direction: ingress
    ethertype: IPv4
    portRangeMin: 179
    portRangeMax: 179
    protocol: tcp
    remoteManagedGroups:
    - control-plane
    - worker
  - description: IP-in-IP (calico)
    direction: ingress
    ethertype: IPv4
    protocol: 4
    remoteManagedGroups:
    - control-plane
    - worker

These 2 rules would become 4 rules, as we would create a duplicate for each of the remoteManagedGroups. The resulting 4 rules would be added to both control-plane and worker.

I suggest that this change alone could also be the minimum change, and we could add additional<X>SecurityGroupRules later as a backwards-compatible change.

lentzi90 · 2024-01-16T09:31:48Z

Thank you for that thought through comment! I completely agree!

EmilienM · 2024-01-16T20:23:12Z

Cool, I'm working on a change here: #1826 to address what Matt has asked for.

dulek · 2024-01-19T14:35:33Z

I'll just note one thing here - in complicated envs Neutron has performance issues with remote_group_id as it generates a lot of flows. It's advised to use CIDRs directly if possible. This probably isn't a problem for CAPO scale, but I'm mentioning it here for the record.

EmilienM · 2024-02-21T16:41:48Z

Note: this one is not staled, I'll come back to it at some point this year, now that we have a new API for Security Groups.

EmilienM · 2024-02-23T00:28:39Z

this is still highly WIP, I need to update it based on our recent work. Please don't review it yet.

k8s-triage-robot · 2024-05-26T13:37:17Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

EmilienM · 2024-05-27T12:30:06Z

/remove-lifecycle stale

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 22, 2023

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 22, 2023

k8s-ci-robot requested a review from dulek November 22, 2023 22:03

k8s-ci-robot requested a review from tobiasgiese November 22, 2023 22:03

EmilienM changed the title ~~proposals: new API for managed Security Groups~~ ✨ proposal: new API for managed Security Groups Nov 22, 2023

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 22, 2023

EmilienM mentioned this pull request Nov 22, 2023

Implement a new API for managing Security Groups #1752

Open

k8s-ci-robot requested a review from mdbooth November 22, 2023 22:04

k8s-ci-robot requested a review from jichenjc November 22, 2023 22:11

EmilienM commented Nov 23, 2023

View reviewed changes

docs/proposals/20231122-security-groups.md Outdated Show resolved Hide resolved

EmilienM force-pushed the proposal_sg branch from f1341cb to e2229ff Compare November 24, 2023 21:17

jichenjc reviewed Nov 27, 2023

View reviewed changes

lentzi90 reviewed Nov 27, 2023

View reviewed changes

docs/proposals/20231122-security-groups.md Outdated Show resolved Hide resolved

mdbooth reviewed Nov 27, 2023

View reviewed changes

EmilienM force-pushed the proposal_sg branch from e2229ff to 68df77e Compare November 27, 2023 17:00

EmilienM commented Nov 27, 2023

View reviewed changes

docs/proposals/20231122-security-groups.md Outdated Show resolved Hide resolved

dulek suggested changes Nov 28, 2023

View reviewed changes

EmilienM force-pushed the proposal_sg branch from 68df77e to 41d77ff Compare December 1, 2023 15:05

EmilienM force-pushed the proposal_sg branch from 41d77ff to 5722056 Compare February 22, 2024 02:00

proposals: new API for managed Security Groups

3992c24

EmilienM force-pushed the proposal_sg branch from 5722056 to 3992c24 Compare February 26, 2024 13:19

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 26, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ proposal: new API for managed Security Groups #1756

✨ proposal: new API for managed Security Groups #1756

EmilienM commented Nov 22, 2023 •

edited

netlify bot commented Nov 22, 2023 •

edited

k8s-ci-robot commented Nov 22, 2023

EmilienM commented Nov 22, 2023

EmilienM commented Nov 22, 2023

jichenjc Nov 27, 2023

EmilienM Nov 27, 2023

lentzi90 left a comment

mdbooth Nov 27, 2023

EmilienM commented Nov 28, 2023

dulek left a comment

dulek Nov 28, 2023

EmilienM Dec 1, 2023

mkjpryor commented Nov 28, 2023 •

edited

EmilienM commented Nov 29, 2023

EmilienM commented Nov 29, 2023

mkjpryor commented Nov 29, 2023 •

edited

jichenjc commented Dec 1, 2023

mdbooth commented Jan 15, 2024

lentzi90 commented Jan 16, 2024

EmilienM commented Jan 16, 2024

dulek commented Jan 19, 2024

EmilienM commented Feb 21, 2024

EmilienM commented Feb 23, 2024

k8s-triage-robot commented May 26, 2024

EmilienM commented May 27, 2024

✨ proposal: new API for managed Security Groups #1756

Are you sure you want to change the base?

✨ proposal: new API for managed Security Groups #1756

Conversation

EmilienM commented Nov 22, 2023 • edited

netlify bot commented Nov 22, 2023 • edited

✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

k8s-ci-robot commented Nov 22, 2023

EmilienM commented Nov 22, 2023

EmilienM commented Nov 22, 2023

jichenjc Nov 27, 2023

Choose a reason for hiding this comment

EmilienM Nov 27, 2023

Choose a reason for hiding this comment

lentzi90 left a comment

Choose a reason for hiding this comment

mdbooth Nov 27, 2023

Choose a reason for hiding this comment

EmilienM commented Nov 28, 2023

dulek left a comment

Choose a reason for hiding this comment

dulek Nov 28, 2023

Choose a reason for hiding this comment

EmilienM Dec 1, 2023

Choose a reason for hiding this comment

mkjpryor commented Nov 28, 2023 • edited

EmilienM commented Nov 29, 2023

EmilienM commented Nov 29, 2023

mkjpryor commented Nov 29, 2023 • edited

jichenjc commented Dec 1, 2023

mdbooth commented Jan 15, 2024

lentzi90 commented Jan 16, 2024

EmilienM commented Jan 16, 2024

dulek commented Jan 19, 2024

EmilienM commented Feb 21, 2024

EmilienM commented Feb 23, 2024

k8s-triage-robot commented May 26, 2024

EmilienM commented May 27, 2024

EmilienM commented Nov 22, 2023 •

edited

netlify bot commented Nov 22, 2023 •

edited

mkjpryor commented Nov 28, 2023 •

edited

mkjpryor commented Nov 29, 2023 •

edited