Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numaflow-server crashes on start with server.configs.insecure=true #1734

Closed
th0ger opened this issue May 17, 2024 · 10 comments
Closed

numaflow-server crashes on start with server.configs.insecure=true #1734

th0ger opened this issue May 17, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@th0ger
Copy link

th0ger commented May 17, 2024

Describe the bug
Numaflow-server installed with helm is not able to start with UX TLS setting disabled (server.configs.insecure=true).

To Reproduce

kind create cluster
helm repo add numaflow https://numaproj.io/helm-charts
helm repo update
helm install numaflow numaflow/numaflow --version "0.0.2" -f values.yaml

with

server:
  configs:
    # -- Whether to disable TLS for UX server.
    insecure: true
    # -- Port to listen on for UX server, defaults to 8443 or 8080 if insecure is set.
    # port: 8443

The server.configs.insecure value was changed from the default value.

$ watch kubectl get svc
NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes            ClusterIP   10.96.0.1       <none>        443/TCP    7m39s
numaflow-dex-server   ClusterIP   10.96.240.68    <none>        5556/TCP   7m33s
numaflow-server       ClusterIP   10.96.45.218    <none>        8443/TCP   7m33s
numaflow-webhook      ClusterIP   10.96.128.182   <none>        443/TCP    7m33s

It crashes/restarts every minute:

kubectl get pods
NAME                                   READY   STATUS             RESTARTS      AGE
numaflow-controller-854d57798c-89796   1/1     Running            0             7m22s
numaflow-dex-server-7c98b855db-pwt9v   1/1     Running            0             7m22s
numaflow-server-669d687d8-jn99r        0/1     CrashLoopBackOff   6 (43s ago)   7m22s
numaflow-webhook-586fc66c64-drf7c      1/1     Running            0             7m22s

No error logs found:

[GIN-debug] GET    /dex/*name                --> github.com/numaproj/numaflow/server/cmd/server.(*server).Start.NewDexReverseProxy.func3 (3 handlers)
[GIN-debug] POST   /dex/*name                --> github.com/numaproj/numaflow/server/cmd/server.(*server).Start.NewDexReverseProxy.func3 (3 handlers)
[GIN-debug] PUT    /dex/*name                --> github.com/numaproj/numaflow/server/cmd/server.(*server).Start.NewDexReverseProxy.func3 (3 handlers)
[GIN-debug] PATCH  /dex/*name                --> github.com/numaproj/numaflow/server/cmd/server.(*server).Start.NewDexReverseProxy.func3 (3 handlers)
[GIN-debug] HEAD   /dex/*name                --> github.com/numaproj/numaflow/server/cmd/server.(*server).Start.NewDexReverseProxy.func3 (3 handlers)
[GIN-debug] OPTIONS /dex/*name                --> github.com/numaproj/numaflow/server/cmd/server.(*server).Start.NewDexReverseProxy.func3 (3 handlers)
[GIN-debug] DELETE /dex/*name                --> github.com/numaproj/numaflow/server/cmd/server.(*server).Start.NewDexReverseProxy.func3 (3 handlers)
[GIN-debug] CONNECT /dex/*name                --> github.com/numaproj/numaflow/server/cmd/server.(*server).Start.NewDexReverseProxy.func3 (3 handlers)
[GIN-debug] TRACE  /dex/*name                --> github.com/numaproj/numaflow/server/cmd/server.(*server).Start.NewDexReverseProxy.func3 (3 handlers)
[GIN-debug] GET    /livez                    --> github.com/numaproj/numaflow/server/routes.Routes.func1 (3 handlers)
[GIN-debug] GET    /auth/v1/login            --> github.com/numaproj/numaflow/server/apis/v1.(*noAuthHandler).Login-fm (3 handlers)
[GIN-debug] POST   /auth/v1/login            --> github.com/numaproj/numaflow/server/apis/v1.(*noAuthHandler).LoginLocalUsers-fm (3 handlers)
[GIN-debug] GET    /auth/v1/logout           --> github.com/numaproj/numaflow/server/apis/v1.(*noAuthHandler).Logout-fm (3 handlers)
[GIN-debug] GET    /auth/v1/callback         --> github.com/numaproj/numaflow/server/apis/v1.(*noAuthHandler).Callback-fm (3 handlers)
[GIN-debug] GET    /api/v1/authinfo          --> github.com/numaproj/numaflow/server/apis/v1.(*handler).AuthInfo-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces        --> github.com/numaproj/numaflow/server/apis/v1.(*handler).ListNamespaces-fm (3 handlers)
[GIN-debug] GET    /api/v1/cluster-summary   --> github.com/numaproj/numaflow/server/apis/v1.(*handler).GetClusterSummary-fm (3 handlers)
[GIN-debug] POST   /api/v1/namespaces/:namespace/pipelines --> github.com/numaproj/numaflow/server/apis/v1.(*handler).CreatePipeline-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/pipelines --> github.com/numaproj/numaflow/server/apis/v1.(*handler).ListPipelines-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/pipelines/:pipeline --> github.com/numaproj/numaflow/server/apis/v1.(*handler).GetPipeline-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/pipelines/:pipeline/health --> github.com/numaproj/numaflow/server/apis/v1.(*handler).GetPipelineStatus-fm (3 handlers)
[GIN-debug] PUT    /api/v1/namespaces/:namespace/pipelines/:pipeline --> github.com/numaproj/numaflow/server/apis/v1.(*handler).UpdatePipeline-fm (3 handlers)
[GIN-debug] DELETE /api/v1/namespaces/:namespace/pipelines/:pipeline --> github.com/numaproj/numaflow/server/apis/v1.(*handler).DeletePipeline-fm (3 handlers)
[GIN-debug] PATCH  /api/v1/namespaces/:namespace/pipelines/:pipeline --> github.com/numaproj/numaflow/server/apis/v1.(*handler).PatchPipeline-fm (3 handlers)
[GIN-debug] POST   /api/v1/namespaces/:namespace/isb-services --> github.com/numaproj/numaflow/server/apis/v1.(*handler).CreateInterStepBufferService-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/isb-services --> github.com/numaproj/numaflow/server/apis/v1.(*handler).ListInterStepBufferServices-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/isb-services/:isb-service --> github.com/numaproj/numaflow/server/apis/v1.(*handler).GetInterStepBufferService-fm (3 handlers)
[GIN-debug] PUT    /api/v1/namespaces/:namespace/isb-services/:isb-service --> github.com/numaproj/numaflow/server/apis/v1.(*handler).UpdateInterStepBufferService-fm (3 handlers)
[GIN-debug] DELETE /api/v1/namespaces/:namespace/isb-services/:isb-service --> github.com/numaproj/numaflow/server/apis/v1.(*handler).DeleteInterStepBufferService-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/pipelines/:pipeline/isbs --> github.com/numaproj/numaflow/server/apis/v1.(*handler).ListPipelineBuffers-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/pipelines/:pipeline/watermarks --> github.com/numaproj/numaflow/server/apis/v1.(*handler).GetPipelineWatermarks-fm (3 handlers)
[GIN-debug] PUT    /api/v1/namespaces/:namespace/pipelines/:pipeline/vertices/:vertex --> github.com/numaproj/numaflow/server/apis/v1.(*handler).UpdateVertex-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/pipelines/:pipeline/vertices/metrics --> github.com/numaproj/numaflow/server/apis/v1.(*handler).GetVerticesMetrics-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/pipelines/:pipeline/vertices/:vertex/pods --> github.com/numaproj/numaflow/server/apis/v1.(*handler).ListVertexPods-fm (3 handlers)
[GIN-debug] GET    /api/v1/metrics/namespaces/:namespace/pods --> github.com/numaproj/numaflow/server/apis/v1.(*handler).ListPodsMetrics-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/pods/:pod/logs --> github.com/numaproj/numaflow/server/apis/v1.(*handler).PodLogs-fm (3 handlers)
[GIN-debug] GET    /api/v1/namespaces/:namespace/events --> github.com/numaproj/numaflow/server/apis/v1.(*handler).GetNamespaceEvents-fm (3 handlers)
[GIN-debug] GET    /api/v1/sysinfo           --> github.com/numaproj/numaflow/server/routes.Routes.func2 (3 handlers)
{
    "level": "info",
    "ts": "2024-05-17T11:58:48.81624813Z",
    "logger": "numaflow.server",
    "caller": "server/start.go:115",
    "msg": "Starting server (TLS disabled) on :8080",
    "version": "Version: v1.2.1, BuildDate: 2024-05-07T08:25:20Z,
                     GitCommit: 89ea33f1d69785f6f5f17f1d5854ac189003918a, 
                     GitTag: v1.2.1, GitTreeState: clean, 
                     GoVersion: go1.21.9, Compiler: gc, Platform: linux/amd64",
    "disable-auth": true,
    "dex-server-addr": "https://numaflow-dex-server:5556/dex",
    "server-addr": "https://localhost:8443"
}
<line-wrapped for readability>

Expected behavior
Don't crash.

Environment (please complete the following information):

  • Kubernetes:
    • kind v0.22.0 go1.20.3 linux/amd64
    • Also on k3s cluster.
  • Numaflow: v1.2.1

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

For quick help and support, join our slack channel.

@th0ger th0ger added the bug Something isn't working label May 17, 2024
@th0ger
Copy link
Author

th0ger commented May 17, 2024

I notice the logs saying

    "msg": "Starting server (TLS disabled) on :8080",
    "disable-auth": true,
    "server-addr": "https://localhost:8443"

The first two lines as expected, but is the server-addr port supposed to be 8443?

@whynowy
Copy link
Member

whynowy commented May 17, 2024

@th0ger - thanks for reporting the issue! The helm chart template needs to be fixed. Created an issue - numaproj/helm-charts#10.

@th0ger
Copy link
Author

th0ger commented May 17, 2024

@whynowy You're welcome. I did indeed wonder if this was a helm or service issue.
But it was not obvious to me to test it with manifests/kustomize.

@whynowy
Copy link
Member

whynowy commented May 17, 2024

@whynowy You're welcome. I did indeed wonder if this was a helm or service issue. But it was not obvious to me to test it with manifests/kustomize.

I can help you with a kuztomize manifests change if that would get you unblocked.

@whynowy
Copy link
Member

whynowy commented May 17, 2024

@th0ger

cat kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - https://github.com/numaproj/numaflow/config/cluster-install?ref=v1.2.1

patches:
  - patch: |
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: numaflow-cmd-params-config
      data:
        server.insecure: "true"
  - patch: |
      - op: replace
        path: /spec/template/spec/containers/0/livenessProbe/httpGet/port
        value: 8080
      - op: replace
        path: /spec/template/spec/containers/0/livenessProbe/httpGet/scheme
        value: HTTP
    target:
      kind: Deployment
      name: numaflow-server
  - patch: |
      - op: replace
        path: /spec/ports/0/targetPort
        value: 8080
      - op: replace
        path: /spec/ports/0/port
        value: 8080
    target:
      kind: Service
      name: numaflow-server

@whynowy
Copy link
Member

whynowy commented May 24, 2024

@th0ger - with latest fix in the helm charts, the issue should have been fixed. Let me know if it works for you when you get a chance. Thanks!

@th0ger
Copy link
Author

th0ger commented May 24, 2024

You forgot to release the cart 0.0.3, again ;-)

$ helm repo update
$ helm search repo numaflow/numaflow --versions
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
numaflow/numaflow       0.0.2                           A Helm chart for installing Numaflow in Kubernetes
numaflow/numaflow       0.0.1                           A Helm chart for installing Numaflow in Kubernetes

But the fix works great!

$ git clone [email protected]:numaproj/helm-charts.git
$ helm install numaflow-git ./helm-charts/charts/numaflow/ -f values.yaml
$ kubectl get svc | grep numaflow-server
numaflow-server       ClusterIP   10.96.127.131   <none>        8080/TCP   2m50s

Pods no longer crashing.
Port 8080 changed as expected.
I can port-forward and run the ui on http://localhost:8080.

@whynowy
Copy link
Member

whynowy commented May 24, 2024

You forgot to release the cart 0.0.3, again ;-)

$ helm repo update
$ helm search repo numaflow/numaflow --versions
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
numaflow/numaflow       0.0.2                           A Helm chart for installing Numaflow in Kubernetes
numaflow/numaflow       0.0.1                           A Helm chart for installing Numaflow in Kubernetes

But the fix works great!

$ git clone [email protected]:numaproj/helm-charts.git
$ helm install numaflow-git ./helm-charts/charts/numaflow/ -f values.yaml
$ kubectl get svc | grep numaflow-server
numaflow-server       ClusterIP   10.96.127.131   <none>        8080/TCP   2m50s

Pods no longer crashing. Port 8080 changed as expected. I can port-forward and run the ui on http://localhost:8080.

Thanks @th0ger !

I'll close this issue.

@chandankumar4 - could you please release 0.0.3?

@chandankumar4
Copy link
Contributor

chandankumar4 commented May 27, 2024

Have automated the release process of numaflow here and released the helm chart version 0.0.3. Thanks

@th0ger
Copy link
Author

th0ger commented May 29, 2024

@chandankumar4 @whynowy chart 0.0.3 works and initial issue fixed.

@th0ger th0ger closed this as completed May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants