Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add OpenStackServerGroup CRD and Controller #1912

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dalees
Copy link

@dalees dalees commented Feb 28, 2024

What this PR does / why we need it:

Implements new CRD for OpenstackServerGroup in v1alpha8 to allow managed Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine that references the new CRD and uses it for VM creation.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1256

Special notes for your reviewer:

This implements comment #1256 (comment)

There are a few TODO's remaining in code comments, and documentation of the feature to do. This first version is to ensure we have general agreement on the approach before continuing work on this.

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • Rebased onto v1beta1 commit (removes v1alpha8)

/hold

Implements new CRD for OpenstackServerGroup in v1alpha8 to allow managed
Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine
that references the new CRD and uses it for VM creation.

Closes: kubernetes-sigs#1256
@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 28, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dalees
Once this PR has been reviewed and has the lgtm label, please assign vincepri for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 28, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @dalees. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 28, 2024
Copy link

netlify bot commented Feb 28, 2024

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 65a96b7
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/65de8f64b61e5700089670de
😎 Deploy Preview https://deploy-preview-1912--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 28, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@dulek dulek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good, some remarks inline.

Comment on lines +31 to +33
// The name of the cloud to use from the clouds secret
// +optional
CloudName string `json:"cloudName"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit weird, we should probably have a reference to an OpenStackCluster instead?

Copy link
Author

@dalees dalees Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback! Yeah, this allows the resource to be reconciled alone, as it's self contained.

However that isn't in any of the use cases, it doesn't seem a limitation to be tied to an existing OpenStackCluster even if the OpenStackServerGroup was only used for workers. It would remove duplication of these creds.

I'll make this change, once the CRD approach is agreed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, okay, that's a fair point. The use case to keep all the workers from different clusters in a single ServerGroup makes sense, I see your point.

Comment on lines +243 to +247
type ServerGroupRef struct {
// Name of the OpenStackServerGroup resource to be used.
// Must be in the same namespace as the resource(s) being provisioned.
Name string `json:"name"`
}
Copy link
Contributor

@dulek dulek Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think LocalObjectReference should be used as a type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be an issue that it specifies omitempty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, it probably would. Okay, the design of this field is good, we can change internals if we want to later.

err = compute.ResolveReferencedMachineResources(scope, &openStackMachine.Spec, &openStackMachine.Status.ReferencedResources)
if err != nil {
return reconcile.Result{}, err
}

// Resolve referenced resources CAPO resources, using the K8s client
err = resolveReferencedClientResources(ctx, r.Client, openStackMachine)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like it's still a Machine resource. Couldn't we put that into ResolveReferencedMachineResources directly? Even if we need to change the arguments of the function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I did start by doing this; I changed to this separation as what they're fetching from is distinct (OpenStack resource vs Kubernetes resource) and the client objects used are different. The OpenStack compute package just doesn't feel like the right place to be looking up K8s resources. It also makes test cases clearer to mock each function.

However, I agree the naming isn't clear. I wonder if renaming ResolveReferencedMachineResources to ResolveReferencedOpenStackResources may help to this end.

I'm open to changing this, but wanted to provide my reasoning first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that, sure. Let's see what other reviewers will say here, especially @mdbooth as ResolveReferencedMachineResources() is an idea of his.

func (r *OpenStackServerGroupReconciler) Reconcile(ctx context.Context, req ctrl.Request) (result ctrl.Result, reterr error) {
log := ctrl.LoggerFrom(ctx)

// Fetch the OpenStackMachine instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log seems wrong.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll fix this in the next iteration.

Comment on lines +142 to +143
// Get the servergroup by name, even if our K8s resource already has the ID field set.
// TODO(dalees): If this returns a 404 do we try to delete with existing UUID? Do we just assume success?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should look up by ID and only then fallback to looking up by name. IDs are safe in case of duplicate names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, happy to change this - I can see it will lead to less problems if a duplicate named resource was created after this managed one.


serverGroupName := openStackServerGroup.Name

serverGroup, err := computeService.GetServerGroupByName(serverGroupName, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we should probably lookup by ID first in case we have duplicate names.

err = compute.ResolveReferencedMachineResources(scope, &openStackMachine.Spec, &openStackMachine.Status.ReferencedResources)
if err != nil {
return reconcile.Result{}, err
}

// Resolve referenced resources CAPO resources, using the K8s client
err = resolveReferencedClientResources(ctx, r.Client, openStackMachine)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I did start by doing this; I changed to this separation as what they're fetching from is distinct (OpenStack resource vs Kubernetes resource) and the client objects used are different. The OpenStack compute package just doesn't feel like the right place to be looking up K8s resources. It also makes test cases clearer to mock each function.

However, I agree the naming isn't clear. I wonder if renaming ResolveReferencedMachineResources to ResolveReferencedOpenStackResources may help to this end.

I'm open to changing this, but wanted to provide my reasoning first.

@@ -22,8 +22,8 @@ import (
)

// ResolveReferencedMachineResources is responsible for populating ReferencedMachineResources with IDs of
// the resources referenced in the OpenStackMachineSpec by querying the OpenStack APIs. It'll return error
// if resources cannot be found or their filters are ambiguous.
// the resources referenced in the OpenStackMachineSpec by querying the OpenStack APIs and K8s resources.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment change will be removed, this package should probably not look up K8s resources.

func (r *OpenStackServerGroupReconciler) Reconcile(ctx context.Context, req ctrl.Request) (result ctrl.Result, reterr error) {
log := ctrl.LoggerFrom(ctx)

// Fetch the OpenStackMachine instance.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll fix this in the next iteration.

Comment on lines +243 to +247
type ServerGroupRef struct {
// Name of the OpenStackServerGroup resource to be used.
// Must be in the same namespace as the resource(s) being provisioned.
Name string `json:"name"`
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be an issue that it specifies omitempty?

@jichenjc
Copy link
Contributor

jichenjc commented Mar 4, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 4, 2024
@mdbooth
Copy link
Contributor

mdbooth commented Mar 8, 2024

@pierreprinetti We agreed this in principal this week. Pinging you because it's similar to something ORC would do.

@chess-knight
Copy link

Hi, at @SovereignCloudStack we are very interested in this feature. What is the progress here @dalees?

@dalees
Copy link
Author

dalees commented May 23, 2024

Hi, at @SovereignCloudStack we are very interested in this feature. What is the progress here @dalees?

Hello - pleased to hear of the interest! I'm keen to get this in, and I'm scheduled to revisit this in the next few weeks to get it back into a reviewable state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: Inbox
Development

Successfully merging this pull request may close these issues.

Use a server group to ensure anti-affinity for control plane nodes
6 participants