Error: failed to get instance metadata #637

cpanato · 2023-12-03T17:43:08Z

In the e2e Conformace CI Artifacts tests for CAPG, we are seeing flaky issues in bootstrapping the workload clusters using the CCM GCP.
See Testgrid: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-gcp#capg-conformance-main-ci-artifacts

It looks like the tests pass when running in the GCP projects, but for others it fails. In the logs of CCM we can see the following errors

2023-11-16T13:52:57.701624403Z stderr F I1116 13:52:57.701427       1 node_controller.go:431] Initializing node capg-conf-lhoc9s-md-0-szjmb with cloud provider
2023-11-16T13:52:57.77723051Z stderr F I1116 13:52:57.776860       1 gen.go:17904] GCEInstances.Get(context.Background.WithDeadline(2023-11-16 14:52:57.702153712 +0000 UTC m=+3643.514258378 [59m59.925284681s]), Key{"capg-conf-lhoc9s-md-0-szjmb", zone: "us-east4-c"}) = <nil>, googleapi: Error 404: The resource 'projects/k8s-infra-e2e-boskos-088/zones/us-east4-c/instances/capg-conf-lhoc9s-md-0-szjmb' was not found, notFound
2023-11-16T13:52:57.777357462Z stderr F E1116 13:52:57.777125       1 node_controller.go:240] error syncing 'capg-conf-lhoc9s-md-0-szjmb': failed to get instance metadata for node capg-conf-lhoc9s-md-0-szjmb: failed to get instance ID from cloud provider: instance not found, requeuing

Seems some missing permission in the project, but also not 100% sure

full logs:
logs.log

Tracking Issue: kubernetes/kubernetes#120481

cc @aojea

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2023-12-03T17:43:16Z

This issue is currently awaiting triage.

If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aojea · 2023-12-05T22:24:57Z

Looks like this is related kubernetes/kubernetes#120615, we need to revendor to get that change
/cc @sdmodi

aojea · 2023-12-05T22:26:55Z

Error 404: The resource 'projects/k8s-infra-e2e-boskos-088/zones/us-east4-c/instances/capg-conf-lhoc9s-md-0-szjmb' was not found, notFound

it is a notFound error, no? it should ne be related to permissions

cpanato · 2023-12-06T18:16:00Z

so we need some documentation regarding which permissions in GCP we need to set

and then make sure we have or set those permissions during the tests in prow/boskos

k8s-triage-robot · 2024-03-05T18:52:32Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-04-04T19:07:24Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

cpanato · 2024-04-05T07:53:56Z

/remove-lifecycle rotten

BenTheElder · 2024-05-07T21:30:24Z

@cpanato can you elaborate a little bit on what needs documenting? Starting to plan out with @shannonxtreme what we need to dig up the technical details for and then write good docs for 😅 xref #686

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 3, 2023

cpanato mentioned this issue Dec 3, 2023

[Flaky Test] (capg-conformance-main-ci-artifacts) No Control Plane machines came into existence kubernetes/kubernetes#120481

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 4, 2024

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: failed to get instance metadata #637

Error: failed to get instance metadata #637

cpanato commented Dec 3, 2023

k8s-ci-robot commented Dec 3, 2023

aojea commented Dec 5, 2023

aojea commented Dec 5, 2023

cpanato commented Dec 6, 2023

k8s-triage-robot commented Mar 5, 2024

k8s-triage-robot commented Apr 4, 2024

cpanato commented Apr 5, 2024

BenTheElder commented May 7, 2024

Error: failed to get instance metadata #637

Error: failed to get instance metadata #637

Comments

cpanato commented Dec 3, 2023

k8s-ci-robot commented Dec 3, 2023

aojea commented Dec 5, 2023

aojea commented Dec 5, 2023

cpanato commented Dec 6, 2023

k8s-triage-robot commented Mar 5, 2024

k8s-triage-robot commented Apr 4, 2024

cpanato commented Apr 5, 2024

BenTheElder commented May 7, 2024