Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: failed to get instance metadata #637

Open
cpanato opened this issue Dec 3, 2023 · 8 comments
Open

Error: failed to get instance metadata #637

cpanato opened this issue Dec 3, 2023 · 8 comments
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@cpanato
Copy link
Member

cpanato commented Dec 3, 2023

In the e2e Conformace CI Artifacts tests for CAPG, we are seeing flaky issues in bootstrapping the workload clusters using the CCM GCP.
See Testgrid: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-gcp#capg-conformance-main-ci-artifacts

It looks like the tests pass when running in the GCP projects, but for others it fails. In the logs of CCM we can see the following errors

2023-11-16T13:52:57.701624403Z stderr F I1116 13:52:57.701427       1 node_controller.go:431] Initializing node capg-conf-lhoc9s-md-0-szjmb with cloud provider
2023-11-16T13:52:57.77723051Z stderr F I1116 13:52:57.776860       1 gen.go:17904] GCEInstances.Get(context.Background.WithDeadline(2023-11-16 14:52:57.702153712 +0000 UTC m=+3643.514258378 [59m59.925284681s]), Key{"capg-conf-lhoc9s-md-0-szjmb", zone: "us-east4-c"}) = <nil>, googleapi: Error 404: The resource 'projects/k8s-infra-e2e-boskos-088/zones/us-east4-c/instances/capg-conf-lhoc9s-md-0-szjmb' was not found, notFound
2023-11-16T13:52:57.777357462Z stderr F E1116 13:52:57.777125       1 node_controller.go:240] error syncing 'capg-conf-lhoc9s-md-0-szjmb': failed to get instance metadata for node capg-conf-lhoc9s-md-0-szjmb: failed to get instance ID from cloud provider: instance not found, requeuing

Seems some missing permission in the project, but also not 100% sure

full logs:
logs.log

Tracking Issue: kubernetes/kubernetes#120481

cc @aojea

@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aojea
Copy link
Member

aojea commented Dec 5, 2023

Looks like this is related kubernetes/kubernetes#120615, we need to revendor to get that change
/cc @sdmodi

@aojea
Copy link
Member

aojea commented Dec 5, 2023

Error 404: The resource 'projects/k8s-infra-e2e-boskos-088/zones/us-east4-c/instances/capg-conf-lhoc9s-md-0-szjmb' was not found, notFound

it is a notFound error, no? it should ne be related to permissions

@cpanato
Copy link
Member Author

cpanato commented Dec 6, 2023

so we need some documentation regarding which permissions in GCP we need to set

and then make sure we have or set those permissions during the tests in prow/boskos

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 4, 2024
@cpanato
Copy link
Member Author

cpanato commented Apr 5, 2024

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 5, 2024
@BenTheElder
Copy link
Member

@cpanato can you elaborate a little bit on what needs documenting? Starting to plan out with @shannonxtreme what we need to dig up the technical details for and then write good docs for 😅 xref #686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

5 participants