Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't reset cluster and collect capz controller log if cluster creation timeout #4534

Open
lzhecheng opened this issue Feb 5, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@lzhecheng
Copy link
Contributor

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide?]

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
Notice: The issue is about collecting log not the cluster failure below.
There's a cluster creation failure recently and its capz controller log is lost.
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/cloud-provider-azure-master-ipv6-capz-1-26/1754248546623688704

The reason why log is lost is that the script reset the mgmt cluster capz after workload cluster creation failure. I think the logic should be improved.

# Get kubeconfig and store it locally.
/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kubectl-v1.25.6 get secrets capz-4yv7h7-kubeconfig -o json | jq -r .data.value | base64 --decode > ./kubeconfig
timeout --foreground 600 bash -c "while ! /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kubectl-v1.25.6 --kubeconfig=./kubeconfig get nodes | grep control-plane; do sleep 1; done"
Unable to connect to the server: dial tcp 52.177.48.112:6443: i/o timeout
Unable to connect to the server: dial tcp 52.177.48.112:6443: i/o timeout
Unable to connect to the server: dial tcp 52.177.48.112:6443: i/o timeout
make[1]: *** [Makefile:346: create-workload-cluster] Error 124
make[1]: Leaving directory '/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure'
make: *** [Makefile:373: create-cluster] Error 2
================ MAKE CLEAN ===============
make clean-bin
make[1]: Entering directory '/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure'
rm -rf bin
rm -rf hack/tools/bin
make[1]: Leaving directory '/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure'
make clean-temporary
make[1]: Entering directory '/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure'
rm -f minikube.kubeconfig
rm -f kubeconfig
rm -f *.kubeconfig
make[1]: Leaving directory '/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure'
================ KIND RESET ===============
GOBIN=/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin ./scripts/go_install.sh sigs.k8s.io/kind kind v0.20.0
/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kind-v0.20.0 delete cluster --name=capz || true
Deleting cluster "capz" ...
Deleted nodes: ["capz-control-plane"]
/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kind-v0.20.0 delete cluster --name=capz-e2e || true
Deleting cluster "capz-e2e" ...
================ INSTALL TOOLS ===============
Unable to connect to the server: dial tcp 52.177.48.112:6443: i/o timeout
GOBIN=/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin ./scripts/go_install.sh github.com/drone/envsubst/v2/cmd/envsubst envsubst v2.0.0-20210730161058-179042472c46
GOBIN=/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin ./scripts/go_install.sh sigs.k8s.io/kustomize/kustomize/v4 kustomize v4.5.2
mkdir -p /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin
rm -f "/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kubectl*"
curl --retry 3 -fsL https://dl.k8s.io/release/v1.25.6/bin/linux/amd64/kubectl -o /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kubectl-v1.25.6
ln -sf /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kubectl-v1.25.6 /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kubectl
chmod +x /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kubectl-v1.25.6 /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kubectl
mkdir -p /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin
rm -f "/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/helm*"
curl --retry 3 -fsSL -o /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/get_helm.sh
USE_SUDO=false HELM_INSTALL_DIR=/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin DESIRED_VERSION=v3.12.2 BINARY_NAME=helm-v3.12.2 /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/get_helm.sh
Downloading https://get.helm.sh/helm-v3.12.2-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm-v3.12.2 into /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin
helm-v3.12.2 installed into /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/helm-v3.12.2
ln -sf /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/helm-v3.12.2 /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/helm
rm -f /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/get_helm.sh
GOBIN=/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin ./scripts/go_install.sh github.com/onsi/ginkgo/v2/ginkgo ginkgo v2.13.1
Unable to find kubeconfig for kind mgmt cluster capz
Collecting logs for cluster capz-4yv7h7 in namespace default and dumping logs to /logs/artifacts
panic: Failed to get ClientConfig from "/root/.kube/config"
Unexpected error:
    <clientcmd.errConfigurationInvalid | len:1, cap:1>: 
    invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
    [
        <*clientcmd.errEmptyConfig | 0x508a920>{
            message: "no configuration has been provided, try setting KUBERNETES_MASTER environment variable",
        },
    ]
occurred

What did you expect to happen:
CAPZ controller log is collected after cluster creation timeout.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • cluster-api-provider-azure version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 5, 2024
@mboersma mboersma added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
Status: No status
Development

No branches or pull requests

3 participants