Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: Multi Cluster TargetGroupBindings #3478

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Alex-Waring
Copy link

@Alex-Waring Alex-Waring commented Nov 12, 2023

Issue

#2173

Description

This PR attempts to implement the feature requested in the issue linked.

Problem: When running TargetGroupBinding resources on target groups that are managed elsewhere as well (either by another cluster or by manual attachment) the controller will remove any targets it did not add.

Reason: In defaultResourceManager.reconcileWithIPTargetType, the targets in an TG are listed, compared with the expected from within the cluster, and those that are not expected are removed. This is so that pods or instances that have been removed from the filter will be deregistered as well.

Solution: I have created a new function called ListOwnedTargets, gated behind an option on the controller, that will only return targets that the cluster manages. This is done by looking up the instance tags (all EKS instances must be tagged, so we can use that), or by checking the IP is within the EKS subnets. To do this I have implemented a new EKS client and info resolver to return that information from the cluster.

I have also implemented a new instance Cache, as instances are long lived.

I have not added docs yet, as I would like input that that is a valid solution before I write docs.

The suggested solution in the linked issue was to use a config map to store the targets. Configmaps are limited to 1Mb, and are not designed for large volumes of data. That means that the solution is not scalable and does not work for larger clients.

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 12, 2023
Copy link

linux-foundation-easycla bot commented Nov 12, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

Welcome @Alex-Waring!

It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 12, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @Alex-Waring. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 12, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Alex-Waring
Once this PR has been reviewed and has the lgtm label, please assign m00nf1sh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 12, 2023
@Alex-Waring Alex-Waring force-pushed the AWar_cluster_aware_tg_reconciliation branch from 17c8fc2 to a8b96e5 Compare November 13, 2023 10:09
@Alex-Waring Alex-Waring marked this pull request as ready for review November 13, 2023 10:09
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2023
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks huge but most of it is this package. We don't use most of it, but I thought better to keep with standard and inline with the CI

@@ -27,13 +27,15 @@ const (
flagBackendSecurityGroup = "backend-security-group"
flagEnableEndpointSlices = "enable-endpoint-slices"
flagDisableRestrictedSGRules = "disable-restricted-sg-rules"
flagEnableClusterAwareBindings = "enable-cluster-aware-bindings"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature is enabled by a flag on the controller, but it could also be on the binding resource itself.

m.targetsCache.Set(tgARN, targetsCacheItem, m.targetsCacheTTL)

var parsedTargets []TargetInfo
clusterCIDRs, err := eksInfoResolver.ListCIDRs(ctx)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved every time, as it is possible to change subnets of an EKS cluster, and if the subnets are changed we don't want to have to wait for the cache to clear until new pods can be bound there

@Alex-Waring
Copy link
Author

@oliviassss @kishorj Would you be able to take a look please

@Robsta86
Copy link

@oliviassss @kishorj It would be highly appreciated if you could please review the work of @Alex-Waring. This has been pending for a month by now.......

@Alex-Waring
Copy link
Author

@M00nF1sh can you take a look please?

@Alex-Waring
Copy link
Author

@oliviassss @johngmyers @kishorj Please?

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Jan 15, 2024

@Alex-Waring
Thanks for making contributions and sorry for the late reply.

This PR is similar to a previous PR: #1865
I think we need a reliable way to identify targets added by a controller instead of filtering based on instance tag/IP.
e.g. implementation like this cannot handle IP targets in two cluster within same VPC

Quoting similar comments from the PR:

I think it's hacky in the way how it sub-divides a TargetGroup and how it treats IP/Instance Targets.

If we sub-divides a targetGroup, ideally it should work like a TGB only manages targets in TargetGroup added by it. However, in this case, we are identify the targets added a TGB using IPFilter and InstanceTags filter, which are limited to this specific use case. (an sample case cannot be satisfied by this is like share same targetGroup across multiple services/TGB within a single cluster).
Ideally, it should work like some tags/attributes associated with each target(like how cloudMap API did: https://docs.aws.amazon.com/cloud-map/latest/api/API_RegisterInstance.html). But ELB team doesn't have plans to support this.

another approach is instead of sub-divides a TargetGroup, we use weighted routing, so each set of Targets get it's own TargetGroup. However, only ALB supports this currently, and NLB support for this is planed for 2022.

so, personally i'm opposite to add this as fields into TGB's API as it's not generic and adds unnecessary burden for generic users.
but i'm open to add it as a flag to the controller binary for this specific use case/setup.

@Alex-Waring
Copy link
Author

Hi @M00nF1sh, thank you for the response. I think that this leaves two options for this feature request.

The first (although I may have miss-understood you) is to merge similar to this PR. It's feature flagged on the controller binary so it can remain a fairly secret feature, with all the caveats attached. Having said that, it does work for multiple clusters in the same VPC, just not the same subnet, which I think is a much less likely patter.

The other, and longer but better approach, is to properly track what targets the controller is looking after. As targets have no form of labelling on them, and nothing is in the roadmap, this would mean implementing some form of DB. I guess the two obvious options are config maps and DynamoDB. I am very very hesitant to use config maps to store application data, given the security and latency issues this creates, so DynamoDB seems to make sense to me.

Having said that, it's a bit of a larger re-wright that I would split into multiple PRs.

Thoughts?

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Jan 17, 2024

@Alex-Waring
thanks for the detailed response. I'm open to such features and i think it's a valuable use case.
I have bring this PR to our PM team to seek their feedback on whether we should take a short term solution and hide it behind some feature flags :D

BTW, would you share more details on why you choose to sub-divide a targetGroup between multiple controllers instead of relies on the "weighted load balancing" feature from alb?

@Alex-Waring
Copy link
Author

Alex-Waring commented Jan 18, 2024

@M00nF1sh
Thank you for that, happy to hide things behind feature flags when I know about them!

Sure, the simple answer is NLBs. I noticed that your original answer included

NLB support for this is planed for 2022

In reference to weighted target groups, but sadly this hasn't happened yet and I can't see any plan for it happening in the future, so for any use case where NLBs are required (custom protocols, low latency, overlay protocols, mTLS when I wrote this PR but that's now supported by ALBs, EPS) weighted routing isn't possible.

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Jan 23, 2024

@Alex-Waring
Thanks a lot for the information, wants to give some updates about this so far.
per our internal discussions, we don't want to merge this intermittent solution given it's limitations.
The ELB team will share the roadmap about weighted targetGroups for NLB. In the meanwhile, they are starting evaluate the case to support tagging on targets for such use case that sub-divide a targetGroup.

@hanseltime
Copy link

@Alex-Waring

I have a need for this functionality as well (particularly with things like doing blue green deploys of cluster upgrades). Were you working on one of the 2 solutions that you had proposed?

I'm asking to understand if it's worth taking a stab at this or if I'm just late to the party.

@Alex-Waring
Copy link
Author

Hi @hanseltime, it's really a question for @M00nF1sh and team. This PR provides a good starting point for option one, it provides a dirty fix hidden behind a feature flag. Option two requires work by the NLB team and I have no insight into that.

@Robsta86
Copy link

Just like @hanseltime we hope to see something like this implemented for blue-green cluster upgrades, as we see a bunch of potential risks in upgrading in place.

I hope implementing this behind a feature flag is something that can be reconsidered as a short term solution, as I am afraid it will take a long time before the NLB team will come up with something to support said use case.

@hanseltime
Copy link

@Alex-Waring @M00nF1sh - I've tried to make sure I am understanding where this PR sits.

From what I've read:

@M00nF1sh - it looks like you want a deterministic way to understand exactly which target group corresponds to the load balancer controller. As it stands right now, the EKS cluster CIDRs issue creates several race conditions where the controller might think some target group is or is not it's own if it's CIDRs change. And it also doesn't support if 2 clusters are running on the same subnets.

@Alex-Waring - Please correct me if I misunderstood the functionality of this PR.

Here is a solution that I would not mind working if we agree this solves the assumed problems above:

  1. We make use of AWS tagging (just like many other functions in EKS already does)
  2. We put this behind an opt-in flag like @Alex-Waring has already done
  3. The general flow is: When creating target groups, add some sort of "aws-load-balancer":"cluster-name-or-specified" tag to the target groups that are managed by the load balancer controller
  4. When a controller is trying to modify a target group, it only returns and modifies groups with that tag.
  5. In the event of a custom TargetGroupBinding entry (from an outside target group), the loadbalancer can either have a "permittedToOwn" field on the TargetGroupBinding that allows it to tag the Target group one first interaction or it can throw an error if missing the specified tag.

This is definitely a high-level pitch, but if that solves the issues, I am more than willing to work on submitting it in the near future. @M00nF1sh

@Alex-Waring
Copy link
Author

Hi @hanseltime

I don't think this quite follows. The aim is to bind a targe group to multiple clusters so that traffic to one NLB can be sent to multiple clusters. This means that to manage them in the way you suggest we need to be able to tag not the target groups but the targets themselves. This isn't something that is supported at the moment and that's the blocker, as I really don't like the idea of adding some sort of datastore to this.

@hanseltime
Copy link

@Alex-Waring - thank you for pointing that out. I was doing this on my lunch break and didn't double check my notes. Apologies for not reading close enough 🙃

I guess I was trying to find a solution to something that you had thoroughly solved on re-reading. @M00nF1sh - does this mean that, if I want to use this functionality, someone needs to publish a brand new controller image and just link people to that repo while keeping it up to date?

From the standpoint of scope of concern, a load balancer controller that wipes out other target group targets when it doesn't own the target group feels like a bug worth fixing (even if behind a feature flag).

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2024
@Robsta86
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants