CoreDNS causing an issue to InsufficientNumberOfReplicas - The add-on is unhealthy because it doesn't have the desired number of replicas. #3003

eravindar12 · 2024-04-10T04:23:19Z

Description

I'm encountering an issue with CoreDNS related to an insufficient number of replicas. The add-on is currently flagged as unhealthy due to the shortfall in the desired number of replicas. This issue arises while utilizing the EKS module in conjunction with a custom AMI.

I am using below eks module

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name                   = local.name
  cluster_version                = local.cluster_version

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true

  enable_irsa = true

  enable_cluster_creator_admin_permissions = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  eks_managed_node_groups = { 
   cis_ami = {
      instance_types = ["m5.large"]

      ami_id  = data.aws_ami.image.id

      # # This will ensure the bootstrap user data is used to join the node
      enable_bootstrap_user_data = true

      iam_role_attach_cni_policy = true

      min_size     = 1
      max_size     = 6
      desired_size = 4      

    }
  }

  # EKS Addons
  cluster_addons = {
    coredns    = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    # aws-ebs-csi-driver   = {
    #   service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
    # }
    vpc-cni = {

      before_compute = true
      most_recent    = true 
      configuration_values = jsonencode({
        env = {
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
  }

  tags = local.tags
}

Due to this the pods are crashing - here is the out put behaviour

❯ k get po -n kube-system
NAME                                                         READY   STATUS             RESTARTS         AGE
aws-load-balancer-controller-54f58989fd-hj848                0/1     CrashLoopBackOff   17 (3m49s ago)   70m
aws-load-balancer-controller-54f58989fd-k2qzn                0/1     CrashLoopBackOff   17 (4m1s ago)    70m
aws-node-9bkt8                                               2/2     Running            0                71m
aws-node-psdvq                                               2/2     Running            0                71m
aws-node-qmhxg                                               2/2     Running            0                71m
aws-node-xl99d                                               2/2     Running            0                71m
cluster-autoscaler-aws-cluster-autoscaler-848fbf899c-8nxls   0/1     CrashLoopBackOff   16 (90s ago)     66m
coredns-557586b4b9-hnlg5                                     0/1     Running            0                64m
coredns-6f99ddbc54-pkltm                                     0/1     Running            0                56m
coredns-6f99ddbc54-xw65l                                     0/1     Running            0                56m
ebs-csi-controller-576c8d5c58-4q6vc                          6/6     Running            0                69m
ebs-csi-controller-576c8d5c58-qk6m9                          6/6     Running            0                70m
ebs-csi-node-5fztg                                           1/3     CrashLoopBackOff   40 (3m12s ago)   71m
ebs-csi-node-7tnrt                                           1/3     CrashLoopBackOff   41 (2m47s ago)   71m
ebs-csi-node-bmpqh                                           2/3     CrashLoopBackOff   41 (3m1s ago)    71m
ebs-csi-node-splqt                                           1/3     CrashLoopBackOff   39 (3m48s ago)   71m
kube-proxy-76gkn                                             1/1     Running            0                71m
kube-proxy-gkhcn                                             1/1     Running            0                71m
kube-proxy-hqxw7                                             1/1     Running            0                71m
kube-proxy-kxfds                                             1/1     Running            0                71m

Terminal Output Screenshot(s)

The text was updated successfully, but these errors were encountered:

eravindar12 · 2024-04-10T13:21:30Z

@bryantbiggs - JFYI, when try to view the coredns logs i am noting this error.

❯ k logs -f coredns-xxxx-hnlg5 -n kube-system
Error from server: Get "https://1x.xx.xx.xx:10250/containerLogs/kube-system/coredns-xxx-hnlg5/coredns?follow=true": dial tcp xx.xx.xx.xx:10250: i/o timeout

eravindar12 · 2024-04-10T15:20:43Z

@bryantbiggs - To resolve the issue, I need to add iptables entries to enable incoming calls from Kubernetes, as I'm using a custom AMI.

My goal is to determine how to modify the Terraform EKS module to override the Bootstrap command.

for exmaple:

document ref: https://aws.amazon.com/blogs/containers/building-amazon-linux-2-cis-benchmark-amis-for-amazon-eks/

  overrideBootstrapCommand: |
      #!/bin/bash
      set -ex
      iptables -I INPUT -p tcp -m tcp --dport 10250 -j ACCEPT
      /etc/eks/bootstrap.sh $CLUSTER_NAME

I am utilizing terraform eks module and customize node group as below

  eks_managed_node_groups = { 
   cis_ami = {
      instance_types = ["m5.large"]

      ami_id  = data.aws_ami.image.id

      # # This will ensure the bootstrap user data is used to join the node
      enable_bootstrap_user_data = true

      iam_role_attach_cni_policy = true

      min_size     = 1
      max_size     = 6
      desired_size = 4      

    }
  }

kstevensonnv · 2024-04-12T10:15:40Z

https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/eks_managed_node_group/main.tf#L204

  enable_bootstrap_user_data = true

  pre_bootstrap_user_data = <<-EOT
    export FOO=bar
  EOT

github-actions · 2024-05-13T00:11:18Z

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

bryantbiggs added the question label Apr 10, 2024

github-actions bot added the stale label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoreDNS causing an issue to InsufficientNumberOfReplicas - The add-on is unhealthy because it doesn't have the desired number of replicas. #3003

CoreDNS causing an issue to InsufficientNumberOfReplicas - The add-on is unhealthy because it doesn't have the desired number of replicas. #3003

eravindar12 commented Apr 10, 2024

eravindar12 commented Apr 10, 2024

eravindar12 commented Apr 10, 2024

kstevensonnv commented Apr 12, 2024

github-actions bot commented May 13, 2024

CoreDNS causing an issue to InsufficientNumberOfReplicas - The add-on is unhealthy because it doesn't have the desired number of replicas. #3003

CoreDNS causing an issue to InsufficientNumberOfReplicas - The add-on is unhealthy because it doesn't have the desired number of replicas. #3003

Comments

eravindar12 commented Apr 10, 2024

Description

Due to this the pods are crashing - here is the out put behaviour

Terminal Output Screenshot(s)

eravindar12 commented Apr 10, 2024

eravindar12 commented Apr 10, 2024

kstevensonnv commented Apr 12, 2024

github-actions bot commented May 13, 2024