Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error monitoring cluster health: no matches for kind "Cluster" #4942

Closed
levkp opened this issue May 15, 2024 · 11 comments
Closed

Error monitoring cluster health: no matches for kind "Cluster" #4942

levkp opened this issue May 15, 2024 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@levkp
Copy link

levkp commented May 15, 2024

What happened:

The controller manager's status becomes CrashLoopBackOff after installing Karmada with remote Helm chart.

What you expected to happen:

All pods in the karmada-system namespace have status Running.

How to reproduce it (as minimally and precisely as possible):

Install Karmada following the remote Helm chart method described in karmada/charts/karmada/README.md.
I tried doing this in my personal environment (EKS cluster running on 5 t3.medium EC2 nodes), and in Killercoda. Both gave the same logs.

Anything else we need to know?:

Here is the (I hope all) relevant output of kubectl logs karmada-controller-manager-77f9f77789-dlkt8 -n karmada-system in my personal Kubernetes environment:

I0515 13:56:08.021271       1 detector.go:217] Reconciling object: apiregistration.k8s.io/v1, kind=APIService, v1.autoscaling
I0515 13:56:08.021337       1 detector.go:353] Attempts to match cluster policy for resource(apiregistration.k8s.io/v1, kind=APIService, v1.autoscaling)
I0515 13:56:08.021352       1 detector.go:360] No clusterpropagationpolicy find.
I0515 13:56:08.021419       1 recorder.go:104] "events: No policy match for resource" type="Warning" object={Kind:APIService Namespace: Name:v1.autoscaling UID:22e4e2cf-394b-4c4d-b29b-f16565665433 APIVersion:apiregistration.k8s.io/v1 ResourceVersion:20 FieldPath:} reason="ApplyPolicyFailed"
E0515 13:56:08.037585       1 cluster_controller.go:189] Error monitoring cluster health: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1"

and

E0515 13:56:13.137639       1 unified_auth_controller.go:277] Failed to list existing clusters, error: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1"
I0515 13:56:14.838066       1 controller.go:219] "Starting workers" controller="resourcebinding" controllerGroup="work.karmada.io" controllerKind="ResourceBinding" worker count=5
[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:
goroutine 415 [running]:
runtime/debug.Stack()
	/opt/hostedtoolcache/go/1.20.6/x64/src/runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	/home/runner/work/karmada/karmada/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).Error(0xc0001f6440, {0x29b4e00, 0xc0018e9800}, {0x2632125, 0x3d}, {0xc0017ae040, 0x2, 0x2})
	/home/runner/work/karmada/karmada/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:139 +0x68
github.com/go-logr/logr.Logger.Error({{0x29dd340?, 0xc0001f6440?}, 0xc000561660?}, {0x29b4e00, 0xc0018e9800}, {0x2632125, 0x3d}, {0xc0017ae040, 0x2, 0x2})
	/home/runner/work/karmada/karmada/vendor/github.com/go-logr/logr/logr.go:299 +0xda
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1({0x29d65a0?, 0xc000300b90?})
	/home/runner/work/karmada/karmada/vendor/sigs.k8s.io/controller-runtime/pkg/internal/source/kind.go:63 +0x265
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1(0xc0010d5e00?, {0x29d65a0?, 0xc000300b90?})
	/home/runner/work/karmada/karmada/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:62 +0x5d
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext({0x29d65a0, 0xc000300b90}, {0x29d3ad0?, 0xc000378680}, 0x1, 0x0, 0x22058a0?)
	/home/runner/work/karmada/karmada/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:63 +0x205
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel({0x29d65a0, 0xc000300b90}, 0xc0009204d8?, 0x0?, 0xc0006b6f08?)
	/home/runner/work/karmada/karmada/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:33 +0x5c
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1()
	/home/runner/work/karmada/karmada/vendor/sigs.k8s.io/controller-runtime/pkg/internal/source/kind.go:56 +0xfa
created by sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start
	/home/runner/work/karmada/karmada/vendor/sigs.k8s.io/controller-runtime/pkg/internal/source/kind.go:48 +0x1e5

Environment:

  • Karmada version: 1.9.1
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
kubectl karmada version: version.Info{GitVersion:"v1.9.0", GitCommit:"a03aa846cfd2c1978b166660d6f592a1c10aeb3d", GitTreeState:"clean", BuildDate:"2024-02-29T08:15:22Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}
  • Others:
@levkp levkp added the kind/bug Categorizes issue or PR as related to a bug. label May 15, 2024
@RainbowMango
Copy link
Member

Install Karmada following the remote Helm chart method described in karmada/charts/karmada/README.md.
I tried doing this in my personal environment (EKS cluster running on 5 t3.medium EC2 nodes), and in Killercoda. Both gave the same logs.

@chaosi-zju Can you help to reproduce it on your side?

@RainbowMango
Copy link
Member

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:

@levkp This log looks like a panic, but it isn't, and it has been fixed on master(see #4855) since it doesn't affect any functionality, so we can ignore this log here.

@chaosi-zju
Copy link
Member

Hi @levkp, sorry for the late reply (((;꒪ꈊ꒪;))).

First, I want to confirm what version karmada did you installed. As you said:

Environment:

Karmada version: 1.9.1

But may be the latest version is 1.9.0:

$ helm search repo karmada                                                                                     
NAME                            CHART VERSION   APP VERSION     DESCRIPTION                      
karmada-charts/karmada          v1.9.0          latest          A Helm chart for karmada         
karmada-charts/karmada-operator v1.8.0          v1.1.0          A Helm chart for karmada-operator

Then, I installed Karmada v1.9.0 following the remote Helm chart method described in karmada/charts/karmada/README.md too, but I didn't encounter your problem. My installation succeed:

$ kubectl get po -A   
NAMESPACE            NAME                                                 READY   STATUS    RESTARTS        AGE
karmada-system       etcd-0                                               1/1     Running   0               4m33s
karmada-system       karmada-aggregated-apiserver-6bf466fdc4-fv86h        1/1     Running   2 (4m27s ago)   4m33s
karmada-system       karmada-apiserver-756b559f84-qf2td                   1/1     Running   0               4m33s
karmada-system       karmada-controller-manager-7b9f6f5f5-v5bwp           1/1     Running   3 (4m16s ago)   4m33s
karmada-system       karmada-kube-controller-manager-7b6d45cbdf-5kk8d     1/1     Running   2 (4m27s ago)   4m33s
karmada-system       karmada-scheduler-64db5cf5d6-bgd85                   1/1     Running   0               4m33s
karmada-system       karmada-webhook-7b6fc7f575-chqjk                     1/1     Running   0               4m33s

Next, in case of your CrashLoopBackOff error, it is worth noting that the ready running state of the karmada-controller-manager must basing on the ready running state of karmada-apiserver and etcd. How are these two components running? What are the clues if they fail?

At Last, is it necessary for you to use remote helm method? May be we can also try downloading the chart and install it locally? For efficient installation, may be you can refer and try steps mentioned in #4963 (comment)

@chaosi-zju
Copy link
Member

similar problem in progress #4917

@chaosi-zju
Copy link
Member

hi @levkp, can you try install karmada at karmada-system namespace? (do not use other namespace)

@levkp
Copy link
Author

levkp commented May 22, 2024

Hi @chaosi-zju!

sorry for the late reply

No problem, and thanks for looking into this.

First, I want to confirm what version karmada did you installed.

I installed 1.9.0, I wrote 1.9.1 only by mistake.

As you recommended, I followed your installation steps in #4963 after cloning the repo:

$ helm install karmada -n karmada-system   --kubeconfig ~/.kube/config   --create-namespace   --dependency-update   --set apiServer.hostNetwork=true   ./charts/karmada
NAME: karmada
LAST DEPLOYED: Wed May 22 13:23:16 2024
NAMESPACE: karmada-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ kubectl get secret -n karmada-system karmada-kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d >~/.kube/karmada-apiserver.config
$ KARMADA_APISERVER_ADDR=$(kubectl get ep karmada-apiserver -n karmada-system | tail -n 1 | awk '{print $2}')
$ echo $KARMADA_APISERVER_ADDR
10.0.4.221:5443
$ kubectl get po -n karmada-system
NAME                                               READY   STATUS             RESTARTS        AGE
etcd-0                                             1/1     Running            0               7m18s
karmada-aggregated-apiserver-79f6bdb5b9-nh2g5      1/1     Running            2 (7m8s ago)    7m18s
karmada-apiserver-5bd55dfcff-k7kz9                 1/1     Running            0               7m18s
karmada-controller-manager-6965d94dc4-646sp        0/1     CrashLoopBackOff   4 (82s ago)     7m18s
karmada-kube-controller-manager-5d4795ff87-cxnlr   1/1     Running            2 (7m10s ago)   7m18s
karmada-scheduler-85bcf46665-7n6xw                 1/1     Running            0               7m18s
karmada-webhook-7bbb7ddb98-9xnlq                   1/1     Running            0               7m18s

Here are the logs again for the controller manager. I see three variations of the error I started this issue with:

E0522 11:32:05.026221       1 kind.go:63] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Cluster\" in version \"cluster.karmada.io/v1alpha1\"" logger="controller-runtime.source.EventHandler" kind="Cluster.cluster.karmada.io"
E0522 11:32:05.116317       1 unified_auth_controller.go:285] Failed to list existing clusters, error: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1"
E0522 11:33:05.461726       1 cluster_controller.go:206] Error monitoring cluster health: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1"

As you suggested, I looked at the logs of karmada-apiserver and etcd:

$ kubectl logs karmada-apiserver-5bd55dfcff-k7kz9 -n karmada-system | grep E0
E0522 11:37:23.496829       1 controller.go:116] loading OpenAPI spec for "v1beta2.custom.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
E0522 11:37:23.497978       1 controller.go:113] loading OpenAPI spec for "v1beta2.custom.metrics.k8s.io" failed with: Error, could not get list of group versions for APIService
E0522 11:37:27.388740       1 available_controller.go:460] v1alpha1.cluster.karmada.io failed with: failing or missing response from https://karmada-aggregated-apiserver.karmada-system.svc.cluster.local:443/apis/cluster.karmada.io/v1alpha1: Get "https://karmada-aggregated-apiserver.karmada-system.svc.cluster.local:443/apis/cluster.karmada.io/v1alpha1": dial tcp 172.20.208.240:443: i/o timeout (Client.Timeout exceeded while awaiting headers)
E0522 11:37:28.391880       1 controller.go:113] loading OpenAPI spec for "v1alpha1.cluster.karmada.io" failed with: Error, could not get list of group versions for APIService
E0522 11:37:28.391967       1 controller.go:116] loading OpenAPI spec for "v1alpha1.cluster.karmada.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
E0522 11:37:31.278693       1 available_controller.go:460] v1beta2.custom.metrics.k8s.io failed with: failing or missing response from https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta2: Get "https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta2": dial tcp: lookup karmada-metrics-adapter.karmada-system.svc.cluster.local on 172.20.0.10:53: no such host
E0522 11:37:31.278914       1 available_controller.go:460] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta1: Get "https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta1": dial tcp: lookup karmada-metrics-adapter.karmada-system.svc.cluster.local on 172.20.0.10:53: no such host
E0522 11:37:31.279162       1 available_controller.go:460] v1beta1.metrics.k8s.io failed with: failing or missing response from https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/metrics.k8s.io/v1beta1: Get "https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/metrics.k8s.io/v1beta1": dial tcp: lookup karmada-metrics-adapter.karmada-system.svc.cluster.local on 172.20.0.10:53: no such host
E0522 11:37:32.399915       1 available_controller.go:460] v1alpha1.cluster.karmada.io failed with: failing or missing response from https://karmada-aggregated-apiserver.karmada-system.svc.cluster.local:443/apis/cluster.karmada.io/v1alpha1: Get "https://karmada-aggregated-apiserver.karmada-system.svc.cluster.local:443/apis/cluster.karmada.io/v1alpha1": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0522 11:37:33.401077       1 controller.go:113] loading OpenAPI spec for "v1alpha1.cluster.karmada.io" failed with: Error, could not get list of group versions for APIService
E0522 11:37:33.401727       1 controller.go:116] loading OpenAPI spec for "v1alpha1.cluster.karmada.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable

I'm not sure where the address 172.20.0.10 is coming from. I have the CNI and CoreDNS plugins installed, so I networking should be fine in my cluster. I'll investigate this further.

NAME                                       STATUS   ROLES    AGE    VERSION               INTERNAL-IP   EXTERNAL-IP     OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-10-0-3-210.eu-west-1.compute.internal   Ready    <none>   101m   v1.29.3-eks-ae9a62a   10.0.3.210    xxxxx   Amazon Linux 2   5.10.215-203.850.amzn2.x86_64   containerd://1.7.11
ip-10-0-4-204.eu-west-1.compute.internal   Ready    <none>   101m   v1.29.3-eks-ae9a62a   10.0.4.204    xxxxx    Amazon Linux 2   5.10.215-203.850.amzn2.x86_64   containerd://1.7.11
ip-10-0-4-221.eu-west-1.compute.internal   Ready    <none>   101m   v1.29.3-eks-ae9a62a   10.0.4.221    xxxxx   Amazon Linux 2   5.10.215-203.850.amzn2.x86_64   containerd://1.7.11

Logs for etcd:

$ kubectl logs etcd-0  -n karmada-system | grep -E 'warn|error'
{"level":"warn","ts":"2024-05-22T11:24:17.786349Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP=tcp://172.20.97.76:2379"}
{"level":"warn","ts":"2024-05-22T11:24:17.786904Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_SERVICE_PORT=2379"}
{"level":"warn","ts":"2024-05-22T11:24:17.786923Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_SERVICE_HOST=172.20.97.76"}
{"level":"warn","ts":"2024-05-22T11:24:17.786933Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_SERVICE_PORT_ETCD_CLIENT_PORT=2379"}
{"level":"warn","ts":"2024-05-22T11:24:17.786945Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT=tcp://172.20.97.76:2379"}
{"level":"warn","ts":"2024-05-22T11:24:17.786954Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP_ADDR=172.20.97.76"}
{"level":"warn","ts":"2024-05-22T11:24:17.786964Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP_PROTO=tcp"}
{"level":"warn","ts":"2024-05-22T11:24:17.787049Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP_PORT=2379"}
{"level":"warn","ts":"2024-05-22T11:24:17.78712Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"warn","ts":"2024-05-22T11:24:17.787406Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"warn","ts":"2024-05-22T11:24:17.78821Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/etcd\" exist, but the permission is \"drwxr-xr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"level":"warn","ts":"2024-05-22T11:24:17.811906Z","caller":"auth/store.go:1241","msg":"simple token is not cryptographically signed"}

can you try install karmada at karmada-system namespace? (do not use other namespace)

I always let Karmada create and use karmada-system.

@chaosi-zju
Copy link
Member

Thanks, I will continue to look into the Error monitoring cluster health: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1" error message, I will reply to you as soon as I find something new.

@calvin0327
Copy link
Contributor

calvin0327 commented May 23, 2024

@chaosi-zju Because crds of karmada be installed after running karmada-controller-manager in our chart, so the karmada-controller-manager not informer those resource. but now pod should return to running after a few restarts.

This issue is not easy to solve, I have a solution here that runs karmada-controller-manager behind post-install-job. how do you think?

@levkp
Copy link
Author

levkp commented May 29, 2024

@chaosi-zju
In the end, I successfully joined 3 clusters using the kubectl plugin: kubectl karmada init. So I ended up not using Helm. Unfortunately, I don't have time to further investigate the issue Error monitoring cluster health: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1".

@RainbowMango
Copy link
Member

Thanks @levkp for spotting. This issue is tracked by #4917 either.
We can close this now.
/close

@karmada-bot
Copy link
Collaborator

@RainbowMango: Closing this issue.

In response to this:

Thanks @levkp for spotting. This issue is tracked by #4917 either.
We can close this now.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: No status
Development

No branches or pull requests

5 participants