Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterctl move can blow up workload clusters #977

Open
magicite opened this issue Oct 24, 2022 · 0 comments
Open

clusterctl move can blow up workload clusters #977

magicite opened this issue Oct 24, 2022 · 0 comments

Comments

@magicite
Copy link

This might be related to #931.

I tried doing a clusterctl move today as part of a demonstration to others, intending to manually work through the issues in #931, but this time the end result is my workload cluster machines got wiped instead. Here's what I think happened:

  1. Started the move, which hit the issues as per clusterctl move misses objects, data in objects, and results in reboot of workload cluster #931
  2. In the bootstrap cluster, the [Metal]Machines resources disappeared but the Servers resources remained.
  3. One of the bootstrap cluster controllers sees that the Servers have been accepted yet are dirty and unallocated (because the [Metal]Machines are gone), thus, it worked to clean/reboot them
  4. Because I hadn't yet updated my DHCP server to point to the new tftp server that is soon to be running on the workload cluster, my workload machines pxe booted from the bootstrap cluster
  5. The bootstrap cluster directs the pxe booting workload machines to wipe the disks.

It seems during a move operation, perhaps bootstrap controllers should be disengaged or some other safety mechanism should be enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant