Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot start Pods from CassandraDataCenter on OpenShift 4.12 #558

Open
tjanssen3 opened this issue Aug 2, 2023 · 0 comments
Open

Cannot start Pods from CassandraDataCenter on OpenShift 4.12 #558

tjanssen3 opened this issue Aug 2, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@tjanssen3
Copy link

What happened?

Pods can't pull images on OCP 4.12:

oc get all,cassdc -n cass
NAME                              READY   STATUS                  RESTARTS   AGE
pod/development-dc1-rack1-sts-0   0/2     Init:ImagePullBackOff   0          3m14s
pod/development-dc1-rack2-sts-0   0/2     Init:ImagePullBackOff   0          3m14s
pod/development-dc1-rack3-sts-0   0/2     Init:ErrImagePull       0          3m14s

NAME                                              TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                                        AGE
service/development-dc1-additional-seed-service   ClusterIP   None         <none>        <none>                                         3m14s
service/development-dc1-all-pods-service          ClusterIP   None         <none>        9042/TCP,8080/TCP,9103/TCP,9000/TCP            3m14s
service/development-dc1-service                   ClusterIP   None         <none>        9042/TCP,9142/TCP,8080/TCP,9103/TCP,9000/TCP   3m14s
service/development-seed-service                  ClusterIP   None         <none>        <none>                                         3m14s

NAME                                         READY   AGE
statefulset.apps/development-dc1-rack1-sts   0/1     3m14s
statefulset.apps/development-dc1-rack2-sts   0/1     3m14s
statefulset.apps/development-dc1-rack3-sts   0/1     3m14s

NAME                                                 IMAGE REPOSITORY                                                            TAGS       UPDATED
imagestream.image.openshift.io/cass-config-builder   image-registry.openshift-image-registry.svc:5000/cass/cass-config-builder   1.0-ubi7   14 minutes ago

NAME                                             AGE
cassandradatacenter.cassandra.datastax.com/dc1   3m14s

What did you expect to happen?

Pods should load their respective container images and start to run.

How can we reproduce it (as minimally and precisely as possible)?

Part 1: Install Cassandra Operator to openshift-operators namespace.

  1. Log into Openshift 4.12 console, navigate to Operators, then OperatorHub. Search for Cassandra. Select "DataStax Kubernetes Operator for Apache Cassandra". Select Install.
  2. In the Install Operator page, select Installation mode: "All namespaces on the cluster". Click "Install" and wait for the operator to finish installing. When this is done, click "View operator".

Part 2: Create CassandraDataCenter CR

  1. Create namespace "cass"
  2. Apply SecurityContextConstraint (below) with oc apply -f scc.yaml
  3. Link SecurityContextConstraint to default ServiceAccount with oc adm policy add-scc-to-user cassandra-scc system:serviceaccount:cass:default
  4. Log into docker registry for image: docker login registry.connect.redhat.com
  5. Create Pull Secret from docker credentials: oc create secret generic pull-secret -n cass --from-file=.dockerconfigjson=/home/travis/docker/config.json --type=kubernetes.io/dockerconfigjson
  6. Link Pull Secret to Default ServiceAccount: oc secrets link default pull-secret -n cass --for=pull
  7. Import image to cass Namespace: oc import-image datastax/cass-config-builder:1.0-ubi7 --from=registry.connect.redhat.com/datastax/cass-config-builder:1.0-ubi7 --confirm -n cass
  8. Create CassandraDataCenter CR with storage class appropriate for your system (below)
  9. Wait for pods to come online

scc.yaml:

kind: SecurityContextConstraints
apiVersion: security.openshift.io/v1
metadata:
  name: cassandra-scc
allowPrivilegedContainer: false
allowedCapabilities:
  - SYS_RESOURCE
runAsUser:
  type: MustRunAs
  uid: 999
FSGroup:
  type: MustRunAs
  ranges: 999,999
seLinuxContext:
  type: RunAsAny

CassandraDataCenter CR:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  creationTimestamp: "2023-08-02T22:05:29Z"
  finalizers:
  - finalizer.cassandra.datastax.com
  generation: 2
  name: dc1
  namespace: cass
  resourceVersion: "328497063"
  uid: 516e6299-88ed-4086-b48f-fc339899a8b2
spec:
  additionalServiceConfig:
    additionalSeedService: {}
    allpodsService: {}
    dcService: {}
    nodePortService: {}
    seedService: {}
  clusterName: development
  config:
    cassandra-yaml:
      authenticator: PasswordAuthenticator
      authorizer: CassandraAuthorizer
      num_tokens: 16
      role_manager: CassandraRoleManager
    jvm-server-options:
      initial_heap_size: 1G
      max_heap_size: 1G
  configBuilderResources: {}
  managementApiAuth:
    insecure: {}
  racks:
  - name: rack1
  - name: rack2
  - name: rack3
  resources:
    requests:
      cpu: "1"
      memory: 2Gi
  serverType: cassandra
  serverVersion: 4.0.3
  size: 3
  storageConfig:
    cassandraDataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: ocs-storagecluster-ceph-rbd
  systemLoggerResources: {}
status:
  cassandraOperatorProgress: Updating
  nodeStatuses: {}

cass-operator version

1.16.0

Kubernetes version

1.25.4

Method of installation

OCP dashboard

Anything else we need to know?

Events from oc describe pod/development-dc1-rack1-sts-0 -n cass:

Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Warning  FailedScheduling        20m                default-scheduler        0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
  Normal   Scheduled               20m                default-scheduler        Successfully assigned cass/development-dc1-rack1-sts-0 to cluster6-kswdj-worker-h5p46
  Normal   SuccessfulAttachVolume  20m                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-1b00dd75-d60b-47eb-8a95-ccf3ae0d8d81"
  Normal   AddedInterface          20m                multus                   Add eth0 [172.23.27.7/21] from openshift-sdn
  Warning  Failed                  18m (x6 over 20m)  kubelet                  Error: ImagePullBackOff
  Normal   Pulling                 18m (x4 over 20m)  kubelet                  Pulling image "datastax/cass-config-builder:1.0-ubi7"
  Warning  Failed                  18m (x4 over 20m)  kubelet                  Failed to pull image "datastax/cass-config-builder:1.0-ubi7": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/articles/3399531
  Warning  Failed                  18m (x4 over 20m)  kubelet                  Error: ErrImagePull
  Normal   BackOff                 7s (x87 over 20m)  kubelet                  Back-off pulling image "datastax/cass-config-builder:1.0-ubi7"

Events say that the image can't be pulled, but there's a Pull Secret with the appropriate credentials, which are linked to the ServiceAccount in the Namespace, and the image can be imported as an ImageStream but not used by the Pods. What am I missing?

@tjanssen3 tjanssen3 added the bug Something isn't working label Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

1 participant