Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build CAPO supported images #1502

Open
tormath1 opened this issue Mar 10, 2023 · 26 comments · May be fixed by #1746
Open

build CAPO supported images #1502

tormath1 opened this issue Mar 10, 2023 · 26 comments · May be fixed by #1746
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@tormath1
Copy link
Contributor

tormath1 commented Mar 10, 2023

/kind feature

Describe the solution you'd like:

During the CAPO office hours, @mdbooth suggested to build CAPO supported images in the CI. These images could be directly consumed by the tests (and consumed by the users?).

These images could be built for multiple Kubernetes version and multiple OS version (ubuntu-2004, flatcar, etc.)

Anything else you would like to add:

On the implementation side, we could use Github Actions and image-builder Docker image. Example with Flatcar:

sudo docker run --privileged --net host -v "${PWD}/docker-output:/output" -v /dev:/dev --rm \
    -e PACKER_FLAGS="--var 'kubernetes_semver=v1.26.2' --var 'kubernetes_series=v1.26'" \
    -e OEM_ID=openstack \
    tormath1/cluster-node-image-builder-amd64 \
    build-qemu-flatcar

(Note: it needs kubernetes-sigs/image-builder#1092)

For Flatcar: flatcar/Flatcar#928 (kubernetes-sigs/image-builder#924) can be interesting

Produced images could be then stored on a GCS bucket.

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 10, 2023
@tormath1
Copy link
Contributor Author

Initial experiments: tormath1@d981cfb but limited because Github Actions do not support KVM (except on hosted runners)

@mdbooth
Copy link
Contributor

mdbooth commented Mar 10, 2023

@tormath1 We would probably use cloudbuild rather than GH actions, and I think that might support KVM.

@tormath1
Copy link
Contributor Author

tormath1 commented Mar 10, 2023

No more luck with cloudbuild - using N1_HIGHCPU_8 which should support nested virtualization.

EDIT: Trying now with no acceleration, just to see how many times it would take.

@tormath1
Copy link
Contributor Author

Here's the result with no acceleration: https://github.com/tormath1/cluster-api-provider-openstack/actions/runs/4408341205/usage - generated artifacts are currently uploaded on Github but we could send them to a GCS bucket.

@mdbooth
Copy link
Contributor

mdbooth commented Mar 14, 2023

That's awesome!

I think we're going to need this in cloudbuild for 2 reasons:

  • We need the artifacts in a GCS bucket for CI or we'll exceed our ingress quota.
  • The only automated way (caveat: that I'm aware of) to use the project's gcloud credentials is from the prow secure cluster.

That said, I don't currently know when I or anybody else will find time to implement this and you seem to have something looks close to useful for at least some people already. I wonder if we should merge a GH actions solution now and aim to replace it with a cloudbuild solution in prow when we get to it.

Thoughts?
/cc @lentzi90 @tobiasgiese @jichenjc

@tormath1
Copy link
Contributor Author

Moving it to cloudbuild is not an issue as it mainly relies on the Makefile and Docker to run. It's just less community friendly as one can easily browse the logs on the Github actions (or maybe we can have the logs from cloubuild ? I don't know how it works :D)

@mdbooth
Copy link
Contributor

mdbooth commented Mar 14, 2023

Moving it to cloudbuild is not an issue as it mainly relies on the Makefile and Docker to run. It's just less community friendly as one can easily browse the logs on the Github actions (or maybe we can have the logs from cloubuild ? I don't know how it works :D)

Yeah, we have the logs: they're in the prow job that runs the cloudbuild, same as regular tests. Admittedly they're not as easy to find, though. This feels like an opportunity for improved docs 🤔

@mdbooth
Copy link
Contributor

mdbooth commented Mar 14, 2023

For reference, here's the logs of one of our nightly image builds: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/cluster-api-provider-openstack-push-images-nightly/1635189053307490304

Incidentally, that job seems pointless. We should probably kill it.

@mdbooth
Copy link
Contributor

mdbooth commented Mar 14, 2023

Example of automatic uploading of artifacts from a cloudbuild job:

upload-staging-artifacts: ## Upload release artifacts to the staging bucket
gsutil cp $(RELEASE_DIR)/* gs://$(STAGING_BUCKET)/components/$(RELEASE_ALIAS_TAG)/

@tormath1
Copy link
Contributor Author

tormath1 commented Mar 15, 2023

Ported the Github Action to cloudbuild, it works fine with a single job but not when running in parallel the four builds (flatcar x2 + ubuntu x2). It might be some network issue with Packer trying to SSH... (logs)

@mdbooth
Copy link
Contributor

mdbooth commented Mar 15, 2023

Could it just be timeouts? It's probably fine to run them sequentially as separate cloudbuild steps, I guess.

@tormath1
Copy link
Contributor Author

I don't think it's timeout, it's very early in the builds - we can run them sequentially if 6h of build is not an issue :)

@tormath1
Copy link
Contributor Author

status update: it "works" with cloudbuild using image-builder Docker image and parallel builds without kvm acceleration but it's not super reliable with commands typed over VNC.

proposals:

@mnaser
Copy link
Contributor

mnaser commented Mar 21, 2023

could a step below this be to maybe build out the Makefile's needed to make this happen so users can easily build things out.. and then this could be using that toolset?

@mdbooth
Copy link
Contributor

mdbooth commented Mar 21, 2023

could a step below this be to maybe build out the Makefile's needed to make this happen so users can easily build things out.. and then this could be using that toolset?

@tormath1 Do you think it's worth committing the Makefile in something like its current state?

@tormath1
Copy link
Contributor Author

I think it's always worth to push things - especially the Makefile as it is already in a good shape. We just miss the image-builder PR to use an official image.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2023
@mdbooth
Copy link
Contributor

mdbooth commented Jun 19, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2023
@lentzi90
Copy link
Contributor

@tormath1 what is the status of this? Do you have something for the Makefile that could be useful? I would like to take a shot at getting the cloudbuild operational 😄
From what I understand the image-builder PR has been merged now and the cloudbuild works sequentially. Parallel builds are flaky though. If this is correct then I vote for pushing what we have and setting up the cloudbuild to run sequentially over night 🙂

@tormath1
Copy link
Contributor Author

@lentzi90 I did not touch this topic since the last comments. There is currently this commit on my fork: tormath1@b6f3326: it builds images in Github Actions using @mdbooth Makefile.
And this is a variant using cloud-build: https://github.com/tormath1/cluster-api-provider-openstack/blob/46806bfbec9181f69017b5512f3ea062d64e7cbc/osimages/cloudbuild-osimages.yaml

so I think we could at least enable the sequential builds - only the upload to a GCS bucket is missing but I think it's trivial since we are in the cloudbuild context (I guess we should have permissions to write into CAPO bucket).

@mnaser
Copy link
Contributor

mnaser commented Oct 17, 2023

FTR, we are doing this on our side at the moment in the Cluster API driver for Magnum. The images are purely generated using GitHub Actions running inside our cloud which have nested virtualization. The images are also then tested with Sonobuoy + CAPI afterwards to make sure they are functional.

https://github.com/vexxhost/magnum-cluster-api/blob/main/.github/workflows/test.yml#L143-L191

@lentzi90 lentzi90 linked a pull request Nov 10, 2023 that will close this issue
3 tasks
@tormath1
Copy link
Contributor Author

@lentzi90 @mdbooth looks like Github runners now support nested virtualization for all runners (actions/runner-images#7191 (comment)) - we could give another try to build images using github actions

@mdbooth
Copy link
Contributor

mdbooth commented Jan 24, 2024

Yep. My motivations for preferring Prow over GH actions are mostly operational, but having nothing remains even more painful. I think we should resurrect the work you already did to get this working a while back.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 23, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 23, 2024
@lentzi90
Copy link
Contributor

/remove-lifecycle rotten
As an added complication, the exception request for using packer in image-builder (despite its BUSL v1.1 license) was rejected by CNCF: cncf/foundation#625 (comment).
Image-builder has pinned the latest acceptable version for now, but who knows what will happen with the project going forward?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: Planned
Development

Successfully merging a pull request may close this issue.

6 participants