Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Migration from 0.12 -> 0.13 #2159

Open
2 tasks done
michaelasp opened this issue Nov 10, 2023 · 7 comments
Open
2 tasks done

Enhance Migration from 0.12 -> 0.13 #2159

michaelasp opened this issue Nov 10, 2023 · 7 comments

Comments

@michaelasp
Copy link
Contributor

michaelasp commented Nov 10, 2023

Is your feature request related to a problem?

Our team ran into issues migrating metallb from 0.12 to 0.13. In our case, it wasn't feasible to run the configmaptocr tool because this would require manual work for every cluster that was using metallb 0.12. This lead to us being stuck at metallb 0.12 for far longer than we wanted to. When we finally upgraded to 0.13, it took a decent bit of work to migrate all of our config to crds. We had to create an automatic wrapper around the configmaptocr tool and have it run prior to starting controllers.

Describe the solution you'd like

Enable an additional flag to metallb controller --run-migration that prior to running the controller runs a migration process. This is very similar to what the configmaptocr process does, but actually works on the cluster that metallb runs on. Modify configmaptocrs slightly to ensure that no matter what, we always get valid, unique CRs upon migration.

Additional context

I already have an implementation of this internally, but would probably need some polish to upstream. Want to gauge interest on this feature and see if it's something that would be wanted.

I've read and agree with the following

  • I've checked all open and closed issues and my request is not there.
  • I've checked all open and closed pull requests and my request is not there.
@oribon
Copy link
Member

oribon commented Nov 12, 2023

hey, can you elaborate a bit more what you eventually did to migrate (the a decent bit of work and automatic wrapper parts)?
I wonder if you could have used a Job to migrate a live cluster - e.g one with an init container that generates the resources (with the tool) into a volume, and the main container applies them taking inspiration from https://github.com/metallb/metallb/tree/main/configmaptocrs#running-directly-against-a-cluster.

also, not sure I understood the Modify configmaptocrs slightly to ensure that.. part, was there something missing in the tool when you used it?

not sure I like the idea of wiring the tool and the controller together with a flag, as if there is a "native" k8s way to achieve it and we document it properly it might be nicer, but ofc I'm open to hearing.

@fedepaol
Copy link
Member

Agree with @oribon on understanding what are the details we missed when writing the tool. Iirc from our discussion, naming was one of the issues.

I am open to have the migration integrated as some part of MetalLB, but what I'd do is to flip the flag and always enable it (maybe if it finds the configmap and all the CRs are empty)? So the users would've to opt out instead of opting in.

@michaelasp
Copy link
Contributor Author

Sure, I can also upload the code in it's not so pretty form that we used for the migration process. The couple issues we had were.

  1. ConfigMapToCR uses a stdin -> stdout architecture which is reasonable but we couldn't use since we wanted to directly use the golang k8s client. Since we already had access to the client, it made more sense for us to use that rather than build a wrapper that would have to pull the configmap anyways and then apply the yaml.

  2. Our upgrade process made it so that running another Job for migration was tedious and would run every time a cluster was either created or upgraded(we idempotently apply yamls to our cluster for the most part). This job might also run after the controller starting which (might?) raise some race conditions which we wanted to avoid for the most part.

  3. We ran into some possible naming issues i.e (pool1 vs Pool1) -> (pool1, pool1) or (1234) ->() with the current naming convention, there were times that either different objects would end up being named the same thing or that we just wouldn't have a name to use. We avoided this by hashing the unique name of the object and appending that to the end of the object name, as well as prepending a name to each object in the case of the object not having a valid name at all.

The automatic wrapper basically just pulled the configmap, then generated the new objects and applied them to the cluster via the same k8s client the rest of metallb uses.

@michaelasp
Copy link
Contributor Author

Created a draft PR to showcase the changes btw in #2172

@oribon
Copy link
Member

oribon commented Nov 30, 2023

Hi, sorry this took a while.

This approach is mostly about making a process that is supposed to be "one-time" (converting the config) to become a continuous one and as a side-effect each time the controller resets it would rewrite the crds.
This makes the configMap have a somewhat weird status, where it is again a point of configuration (which we deliberately tried to move from) and that the resources it creates are uneditable in the sense that any change to them could be overridden.

  1. ConfigMapToCR uses a stdin -> stdout architecture which is reasonable but we couldn't use since we wanted to directly use the golang k8s client. Since we already had access to the client, it made more sense for us to use that rather than build a wrapper that would have to pull the configmap anyways and then apply the yaml.
  2. Our upgrade process made it so that running another Job for migration was tedious and would run every time a cluster was either created or upgraded(we idempotently apply yamls to our cluster for the most part). This job might also run after the controller starting which (might?) raise some race conditions which we wanted to avoid for the most part.

In general, given the conversion work is a one shot activity, external to the controller's purpose, would adjusting the configmaptocrs tool to be able to apply the resources directly \ expose the resources it produces help?
This could be eventually embedded as a sidecar of the controller (assuming you are using helm, we could do something as #2152) or as a k8s job.

The race condition is probably a non issue since the controller does not read the configMap and would eventually converge with the given resources.

@michaelasp
Copy link
Contributor Author

Sorry also for the late response @oribon 😆

Yeah a sidecar container or a k8s job would probably be the ideal way to do this. In our case we have full control over when the flag gets enabled and keep it around for one version only so that method sufficed. We could probably do something like #2152 for the conversion. Does it make sense to have the configmaptocr runner use a kubernetes client rather than stdin stdout? In that case we could run this job as a one shot and then be sure that we have migrated. I can look into doing this and having the configmaptocr be a separate container that can be run on the cluster.

@oribon
Copy link
Member

oribon commented Dec 13, 2023

Seems good to me, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants