Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-15523: Copy expiration to cloud resources (GKE) #895

Draft
wants to merge 39 commits into
base: master
Choose a base branch
from

Conversation

davdhacs
Copy link
Contributor

@davdhacs davdhacs commented Jul 12, 2023

Copy Infra cluster lifespan(expiration) metadata from workflow to gke cluster. Applying an expiration label to cloud resources enables janitor to find expired unneeded resources without interacting with Infra.

The Infra lifespan is recorded as a duration string in an argo workflow annotation (a field within the custom resource).
Applying the label to the cloud resources is flavor-specific, and so I think it should be in the flavor specific workflow or image and not in the infra code. (adding for GKE first)

Infra expects workflows to suspend when cluster creation is complete, and destroy clusters on workflow resume. No hooks or retries are executing when a workflow is suspended. This change adds a loop to the gke workflow checking if the lifespan changed and code to stop the workflow if flagged as not requiring resume (having cluster destroy in a workflow onExit hook).

Questions:
[ ] Why aren't all destroys set as onExit? Cluster destroys can be performed onExit for suspended and not-suspended workflows. It appears some were onExit in older Infra workflows.

@roxbot
Copy link
Contributor

roxbot commented Jul 12, 2023

A single node development cluster (infra-pr-895) was allocated in production infra for this PR.

CI will attempt to deploy us.gcr.io/stackrox-infra/infra-server:0.7.8-47-gf4fdcf80c2 to it.

🔌 You can connect to this cluster with:

gcloud container clusters get-credentials infra-pr-895 --zone us-central1-a --project srox-temp-dev-test

🛠️ And pull infractl from the deployed dev infra-server with:

nohup kubectl -n infra port-forward svc/infra-server-service 8443:8443 &
make pull-infractl-from-dev-server

🚲 You can then use the dev infra instance e.g.:

bin/infractl -k -e localhost:8443 whoami

⚠️ Any clusters that you start using your dev infra instance should have a lifespan shorter then the development cluster instance. Otherwise they will not be destroyed when the dev infra instance ceases to exist when the development cluster is deleted. ⚠️

Further Development

☕ If you make changes, you can commit and push and CI will take care of updating the development cluster.

🚀 If you only modify configuration (chart/infra-server/configuration) or templates (chart/infra-server/{static,templates}), you can get a faster update with:

make install-local

Logs

Logs for the development infra depending on your @stackrox.com authuser:

Or:

kubectl -n infra logs -l app=infra-server --tail=1 -f

@davdhacs davdhacs changed the title dev img and lifespan in tmplt Copy expiration to cloud resources (GKE) Jul 17, 2023
@davdhacs davdhacs changed the title Copy expiration to cloud resources (GKE) ROX-15523: Copy expiration to cloud resources (GKE) Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants