Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook certificate expired when API server starts one year #5520

Open
Smityz opened this issue Jan 12, 2024 · 4 comments
Open

Webhook certificate expired when API server starts one year #5520

Smityz opened this issue Jan 12, 2024 · 4 comments

Comments

@Smityz
Copy link

Smityz commented Jan 12, 2024

Bug Report

What version of Kubernetes are you using?

v1.22

What version of TiDB Operator are you using?

v1.4.4

What did you do?
After running stably for several months, the operator suddenly keeps reporting errors and cannot complete sync, after disable the webhook , the operator returned to normal.
Related error log:

E0112 17:49:03.476708       1 tidb_cluster_controller.go:133] TidbCluster: x sync failed Internal error occurred: failed calling webhook "defaulting.admission.tidb.pingcap.com": failed to call webhook: Post "https://kubernetes.default.svc:443/apis/admission.tidb.pingcap.com/v1alpha1/pingcapresourcemutations?timeout=10s": x509: certificate has expired or is not yet valid: current time 2024-01-12T17:49:03+08:00 is after 2024-01-10T09:50:21Z, requeuing
E0112 17:49:03.859792       1 tidbcluster_control.go:90] failed to update TidbCluster: [x], error: Internal error occurred: failed calling webhook "defaulting.admission.tidb.pingcap.com": failed to call webhook: Post "https://kubernetes.default.svc:443/apis/admission.tidb.pingcap.com/v1alpha1/pingcapresourcemutations?timeout=10s": x509: certificate has expired or is not yet valid: current time 2024-01-12T17:49:03+08:00 is after 2024-01-10T09:50:21Z

We speculate that this may be related to the self-signed mechanism of the api-server, because the expiration time of the certificate happens to be one year after the api server starts. And we also found related bug here openshift/generic-admission-server#33

@csuzhangxc
Copy link
Member

as openshift/generic-admission-server#33 (comment) said, in k8s 1.18, k8s.io/apiserver supports reload of the serving certs.

TiDB Operator v1.4.4 has been using v1.19 of K8s (https://github.com/pingcap/tidb-operator/blob/v1.4.4/go.mod#L65), and this version of generic-admission-server also using k8s v1.19 (https://github.com/openshift/generic-admission-server/blob/da96454c926de350e52f6c7a6ee86af49ee96b00/go.mod), it should reload the certs.

Did your cert just expire or renew after expired?

@iPenx
Copy link

iPenx commented Jan 15, 2024

that's not the certs of tidb-webhook expired, but the CA of "kuberntes.default.svc" in the k8s apiserver is.

because the call flow of tidb crd adminssion is
k8s apiserver -> apiservice (kuberntes.default.svc) -> tidb webhook pod
i.e.
k8s apiserver -> k8s apiserver (kuberntes.default.svc) -> tidb webhook pod

when a k8s apiserver runs for more that one year and doesn't restart, the CA of kuberntes.default.svc in the k8s apiserver memory will expire.
As a result, the k8s apiserver accessing the k8s apiserver itself will fail after a year in this case.

by default the CA of kuberntes.default.svc in k8s apiserver memory is self-signed for one year during k8s apiserver starting.

@csuzhangxc
Copy link
Member

@Smityz is this caused as iPenx said? Have you resolved it?

@Smityz
Copy link
Author

Smityz commented Jan 22, 2024

@Smityz is this caused as iPenx said? Have you resolved it?

Yes, we are in the same team. We disable webhook finally, but I think it's a common problem and it needs to be solve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants