Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EBS CSI Driver] It is not compatible with Windows Managed Node Group #390

Closed
1 task done
carlosrodlop opened this issue Apr 19, 2024 · 8 comments
Closed
1 task done

Comments

@carlosrodlop
Copy link

carlosrodlop commented Apr 19, 2024

Description

Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration (see the examples/* directory for references that you can copy+paste and tailor to match your configs if you are unable to copy your exact configuration). The reproduction MUST be executable by running terraform init && terraform apply without any further changes.

If your request is for a new feature, please use the Feature request template.

  • ✋ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

  • Module version [Required]: 1.15.1

  • Terraform version:

Terraform v1.6.6 on linux_amd64

  • Provider version(s):

Terraform v1.6.6 on linux_amd64

Reproduction Code [Required]

Considerations:

Steps to reproduce the behavior:

  • Just follow the deployment steps for terraform blueprints in the provided files

main.tf

data "aws_availability_zones" "available" {}

locals {
  name   = "ebs-winmng" 
  region = "us-east-1"

  vpc_name             = "${local.name}-vpc"
  cluster_name         = "${local.name}-eks"

  vpc_cidr = "10.0.0.0/16"

  cluster_version = "1.28"

  azs = slice(data.aws_availability_zones.available.names, 0, 2)

  #https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html
  k8s_instance_types = {
    "graviton3" = ["m7g.xlarge"]
  }

  tags = {
    "tf-blueprint"  = local.name
  }

}

################################################################################
# EKS: Add-ons
################################################################################

# EKS Blueprints Add-ons

module "ebs_csi_driver_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "5.29.0"

  role_name_prefix = "${module.eks.cluster_name}-ebs-csi-driv"

  attach_ebs_csi_policy = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
    }
  }

  tags = local.tags
}

module "eks_blueprints_addons" {
  source = "aws-ia/eks-blueprints-addons/aws"
  version = "1.15.1"

  cluster_name      = module.eks.cluster_name
  cluster_endpoint  = module.eks.cluster_endpoint
  oidc_provider_arn = module.eks.oidc_provider_arn
  cluster_version   = module.eks.cluster_version

  eks_addons = {
    aws-ebs-csi-driver = {
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
    }
    coredns    = {}
    vpc-cni    = {}
    kube-proxy = {}
  }

  tags = local.tags
}

################################################################################
# EKS: Infra
################################################################################

module "eks" {
  source = "terraform-aws-modules/eks/aws"
  version = "19.17.1"

  cluster_name                   = local.cluster_name
  cluster_endpoint_public_access = true
  cluster_version = local.cluster_version

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  node_security_group_additional_rules = {

    egress_self_all = {
      description = "Node to node all ports/protocols"
      protocol    = "-1"
      from_port   = 0
      to_port     = 0
      type        = "egress"
      self        = true
    }

    ingress_self_all = {
      description = "Node to node all ports/protocols"
      protocol    = "-1"
      from_port   = 0
      to_port     = 0
      type        = "ingress"
      self        = true
    }

    egress_ssh_all = {
      description      = "Egress all ssh to internet for github"
      protocol         = "tcp"
      from_port        = 22
      to_port          = 22
      type             = "egress"
      cidr_blocks      = ["0.0.0.0/0"]
      ipv6_cidr_blocks = ["::/0"]
    }

    ingress_cluster_to_node_all_traffic = {
      description                   = "Cluster API to Nodegroup all traffic"
      protocol                      = "-1"
      from_port                     = 0
      to_port                       = 0
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }

  eks_managed_node_groups = {
    mg_linux = {
      node_group_name = "managed-linux"
      instance_types  = local.k8s_instance_types["graviton3"]
      ami_type        = "AL2_ARM_64"
      capacity_type   = "ON_DEMAND"
      disk_size       = 25
      desired_size    = 2
    }
    mg_windows = {
      min_size          = 1
      desired_size      = 1
      max_size          = 5
      platform          = "windows"
      ami_type          = "WINDOWS_CORE_2019_x86_64"
      capacity_type     = "SPOT"
      enable_monitoring = true
      disk_size         = "100"
      use_name_prefix   = true
      cluster_version   = local.cluster_version
      instance_types    = ["m5d.xlarge", "m5ad.xlarge"]
      taints = [
        {
          key    = "os"
          value  = "windows"
          effect = "NO_SCHEDULE"
        }
      ]
    }
  }

  create_cloudwatch_log_group = false

  create_kms_key  = true
  kms_key_aliases = ["eks/${local.name}"]

  tags = local.tags
}

################################################################################
# Supported Resources
################################################################################


module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.5.2"

  name = local.vpc_name
  cidr = local.vpc_cidr

  azs             = local.azs
  public_subnets  = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)]
  private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)]

  enable_nat_gateway = true
  single_nat_gateway = true

  #https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html
  #https://docs.aws.amazon.com/eks/latest/userguide/network-load-balancing.html
  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }

  tags = local.tags

}

provider.tf

terraform {
  required_version = ">= 1.0.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 3.72"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.10"
    }
    helm = {
      source  = "hashicorp/helm"
      version = ">= 2.5.1"
    }
  }

}

provider "aws" {
  region = local.region
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      # This requires the awscli to be installed locally where Terraform is executed
      args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
    }
  }
}

Expected behaviour

EBS CSI Driver is deployed correctly

Actual behaviour

EBS CSI Driver is NOT deployed

Terminal Output Screenshot(s)

module.eks.module.eks_managed_node_group["mg_windows"].aws_eks_node_group.this[0]: Still creating... [9m30s elapsed]
module.eks.module.eks_managed_node_group["mg_windows"].aws_eks_node_group.this[0]: Still creating... [9m40s elapsed]
module.eks.module.eks_managed_node_group["mg_windows"].aws_eks_node_group.this[0]: Creation complete after 9m45s [id=ebs-winmng-eks:mg_windows-20240419143200940700000016]

Apply complete! Resources: 41 added, 0 changed, 0 destroyed.

...

Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:

  # module.eks.module.eks_managed_node_group["mg_linux"].aws_eks_node_group.this[0] has changed
  ~ resource "aws_eks_node_group" "this" {
        id                     = "ebs-winmng-eks:mg_linux-20240419143200938700000014"
      + labels                 = {}
        tags                   = {
            "Name"         = "mg_linux"
            "tf-blueprint" = "ebs-winmng"
        }
        # (15 unchanged attributes hidden)

        # (4 unchanged blocks hidden)
    }

  # module.eks.module.eks_managed_node_group["mg_windows"].aws_eks_node_group.this[0] has changed
  ~ resource "aws_eks_node_group" "this" {
        id                     = "ebs-winmng-eks:mg_windows-20240419143200940700000016"
      + labels                 = {}
        tags                   = {
            "Name"         = "mg_windows"
            "tf-blueprint" = "ebs-winmng"
        }
        # (15 unchanged attributes hidden)

        # (5 unchanged blocks hidden)
    }


Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or respond to these
changes.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.ebs_csi_driver_irsa.aws_iam_policy.ebs_csi[0] will be created
  + resource "aws_iam_policy" "ebs_csi" {
      + arn              = (known after apply)
      + attachment_count = (known after apply)
      + description      = "Provides permissions to manage EBS volumes via the container storage interface driver"
      + id               = (known after apply)
      + name             = (known after apply)
      + name_prefix      = "AmazonEKS_EBS_CSI_Policy-"
      + path             = "/"
      + policy           = jsonencode(
            {
              + Statement = [
                  + {
                      + Action   = [
                          + "ec2:ModifyVolume",
                          + "ec2:DetachVolume",
                          + "ec2:DescribeVolumesModifications",
                          + "ec2:DescribeVolumes",
                          + "ec2:DescribeTags",
                          + "ec2:DescribeSnapshots",
                          + "ec2:DescribeInstances",
                          + "ec2:DescribeAvailabilityZones",
                          + "ec2:CreateSnapshot",
                          + "ec2:AttachVolume",
                        ]
                      + Effect   = "Allow"
                      + Resource = "*"
                    },
                  + {
                      + Action    = "ec2:CreateTags"
                      + Condition = {
                          + StringEquals = {
                              + "ec2:CreateAction" = [
                                  + "CreateVolume",
                                  + "CreateSnapshot",
                                ]
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = [
                          + "arn:aws:ec2:*:*:volume/*",
                          + "arn:aws:ec2:*:*:snapshot/*",
                        ]
                    },
                  + {
                      + Action   = "ec2:DeleteTags"
                      + Effect   = "Allow"
                      + Resource = [
                          + "arn:aws:ec2:*:*:volume/*",
                          + "arn:aws:ec2:*:*:snapshot/*",
                        ]
                    },
                  + {
                      + Action    = "ec2:CreateVolume"
                      + Condition = {
                          + StringLike = {
                              + "aws:RequestTag/ebs.csi.aws.com/cluster" = "true"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:CreateVolume"
                      + Condition = {
                          + StringLike = {
                              + "aws:RequestTag/CSIVolumeName" = "*"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:CreateVolume"
                      + Condition = {
                          + StringLike = {
                              + "aws:RequestTag/kubernetes.io/cluster/*" = "owned"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteVolume"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/ebs.csi.aws.com/cluster" = "true"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteVolume"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/CSIVolumeName" = "*"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteVolume"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/kubernetes.io/cluster/*" = "owned"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteVolume"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/kubernetes.io/created-for/pvc/name" = "*"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteSnapshot"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/CSIVolumeSnapshotName" = "*"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteSnapshot"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/ebs.csi.aws.com/cluster" = "true"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + policy_id        = (known after apply)
      + tags             = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all         = {
          + "tf-blueprint" = "ebs-winmng"
        }
    }

  # module.ebs_csi_driver_irsa.aws_iam_role.this[0] will be created
  + resource "aws_iam_role" "this" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "sts:AssumeRoleWithWebIdentity"
                      + Condition = {
                          + StringEquals = {
                              + "oidc.eks.us-east-1.amazonaws.com/id/54034D8D87A2E92EFA859752FD5BEC67:aud" = "sts.amazonaws.com"
                              + "oidc.eks.us-east-1.amazonaws.com/id/54034D8D87A2E92EFA859752FD5BEC67:sub" = "system:serviceaccount:kube-system:ebs-csi-controller-sa"
                            }
                        }
                      + Effect    = "Allow"
                      + Principal = {
                          + Federated = "arn:aws:iam::324005994172:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/54034D8D87A2E92EFA859752FD5BEC67"
                        }
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = true
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = (known after apply)
      + name_prefix           = "ebs-winmng-eks-ebs-csi-driv"
      + path                  = "/"
      + tags                  = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all              = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + unique_id             = (known after apply)
    }

  # module.ebs_csi_driver_irsa.aws_iam_role_policy_attachment.ebs_csi[0] will be created
  + resource "aws_iam_role_policy_attachment" "ebs_csi" {
      + id         = (known after apply)
      + policy_arn = (known after apply)
      + role       = (known after apply)
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"] will be created
  + resource "aws_eks_addon" "this" {
      + addon_name                  = "aws-ebs-csi-driver"
      + addon_version               = "v1.29.1-eksbuild.1"
      + arn                         = (known after apply)
      + cluster_name                = "ebs-winmng-eks"
      + configuration_values        = (known after apply)
      + created_at                  = (known after apply)
      + id                          = (known after apply)
      + modified_at                 = (known after apply)
      + preserve                    = true
      + resolve_conflicts_on_create = "OVERWRITE"
      + resolve_conflicts_on_update = "OVERWRITE"
      + service_account_role_arn    = (known after apply)
      + tags                        = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all                    = {
          + "tf-blueprint" = "ebs-winmng"
        }

      + timeouts {}
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["coredns"] will be created
  + resource "aws_eks_addon" "this" {
      + addon_name                  = "coredns"
      + addon_version               = "v1.10.1-eksbuild.7"
      + arn                         = (known after apply)
      + cluster_name                = "ebs-winmng-eks"
      + configuration_values        = (known after apply)
      + created_at                  = (known after apply)
      + id                          = (known after apply)
      + modified_at                 = (known after apply)
      + preserve                    = true
      + resolve_conflicts_on_create = "OVERWRITE"
      + resolve_conflicts_on_update = "OVERWRITE"
      + tags                        = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all                    = {
          + "tf-blueprint" = "ebs-winmng"
        }

      + timeouts {}
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"] will be created
  + resource "aws_eks_addon" "this" {
      + addon_name                  = "kube-proxy"
      + addon_version               = "v1.28.8-eksbuild.2"
      + arn                         = (known after apply)
      + cluster_name                = "ebs-winmng-eks"
      + configuration_values        = (known after apply)
      + created_at                  = (known after apply)
      + id                          = (known after apply)
      + modified_at                 = (known after apply)
      + preserve                    = true
      + resolve_conflicts_on_create = "OVERWRITE"
      + resolve_conflicts_on_update = "OVERWRITE"
      + tags                        = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all                    = {
          + "tf-blueprint" = "ebs-winmng"
        }

      + timeouts {}
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"] will be created
  + resource "aws_eks_addon" "this" {
      + addon_name                  = "vpc-cni"
      + addon_version               = "v1.18.0-eksbuild.1"
      + arn                         = (known after apply)
      + cluster_name                = "ebs-winmng-eks"
      + configuration_values        = (known after apply)
      + created_at                  = (known after apply)
      + id                          = (known after apply)
      + modified_at                 = (known after apply)
      + preserve                    = true
      + resolve_conflicts_on_create = "OVERWRITE"
      + resolve_conflicts_on_update = "OVERWRITE"
      + tags                        = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all                    = {
          + "tf-blueprint" = "ebs-winmng"
        }

      + timeouts {}
    }

  # module.eks_blueprints_addons.time_sleep.this will be created
  + resource "time_sleep" "this" {
      + create_duration = "30s"
      + id              = (known after apply)
      + triggers        = {
          + "cluster_endpoint"  = "https://54034D8D87A2E92EFA859752FD5BEC67.yl4.us-east-1.eks.amazonaws.com"
          + "cluster_name"      = "ebs-winmng-eks"
          + "custom"            = ""
          + "oidc_provider_arn" = "arn:aws:iam::324005994172:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/54034D8D87A2E92EFA859752FD5BEC67"
        }
    }

Plan: 8 to add, 0 to change, 0 to destroy.
module.ebs_csi_driver_irsa.aws_iam_policy.ebs_csi[0]: Creating...
module.eks_blueprints_addons.time_sleep.this: Creating...
module.ebs_csi_driver_irsa.aws_iam_role.this[0]: Creating...
module.ebs_csi_driver_irsa.aws_iam_policy.ebs_csi[0]: Creation complete after 1s [id=arn:aws:iam::324005994172:policy/AmazonEKS_EBS_CSI_Policy-20240419144250170500000001]
module.ebs_csi_driver_irsa.aws_iam_role.this[0]: Creation complete after 1s [id=ebs-winmng-eks-ebs-csi-driv20240419144250220200000002]
module.ebs_csi_driver_irsa.aws_iam_role_policy_attachment.ebs_csi[0]: Creating...
module.ebs_csi_driver_irsa.aws_iam_role_policy_attachment.ebs_csi[0]: Creation complete after 0s [id=ebs-winmng-eks-ebs-csi-driv20240419144250220200000002-20240419144251773200000003]
module.eks_blueprints_addons.time_sleep.this: Still creating... [10s elapsed]
module.eks_blueprints_addons.time_sleep.this: Still creating... [20s elapsed]
module.eks_blueprints_addons.time_sleep.this: Still creating... [30s elapsed]
module.eks_blueprints_addons.time_sleep.this: Creation complete after 30s [id=2024-04-19T14:43:20Z]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]: Creating...
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Creating...
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]: Creating...
module.eks_blueprints_addons.aws_eks_addon.this["coredns"]: Creating...
module.eks_blueprints_addons.aws_eks_addon.this["coredns"]: Creation complete after 9s [id=ebs-winmng-eks:coredns]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]: Still creating... [10s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]: Still creating... [10s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [10s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]: Still creating... [20s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]: Still creating... [20s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [20s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [30s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]: Still creating... [30s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]: Still creating... [30s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]: Creation complete after 36s [id=ebs-winmng-eks:kube-proxy]
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]: Creation complete after 36s [id=ebs-winmng-eks:vpc-cni]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [40s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [50s elapsed]
...
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [19m51s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [20m1s elapsed]
...
│ Error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
│
│   with module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"],
│   on .terraform/modules/eks_blueprints_addons/main.tf line 2178, in resource "aws_eks_addon" "this":
│ 2178: resource "aws_eks_addon" "this" {
│
╵

Additional context

  • When removing eks_managed_node_groups > mg_windows section the ebs csi driver is deployed correctly.
  • I tried also with the WINDOWS_CORE_2022_x86_64 ami type but it didn't work.
  • Terraform logs:
$ cat terraform.log | grep ERROR | grep -i ebs
2024-04-19T13:08:41.191Z [ERROR] provider.terraform-provider-aws_v5.46.0_x5: Response contains error diagnostic: @module=sdk.proto diagnostic_detail="" tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=ac52bf2a-f067-43af-0cc2-6470b8a7aeba @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_severity=ERROR diagnostic_summary="waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: operation error EKS: DescribeAddon, https response error StatusCode: 403, RequestID: 911cb32e-aa9a-4a9a-97b8-84c0ee3f54a2, api error InvalidSignatureException: Signature expired: 20240419T130840Z is now earlier than 20240419T131027Z (20240419T131527Z - 5 min.)" tf_proto_version=5.4 tf_rpc=ApplyResourceChange tf_resource_type=aws_eks_addon timestamp=2024-04-19T13:08:41.190Z
2024-04-19T13:08:41.272Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: operation error EKS: DescribeAddon, https response error StatusCode: 403, RequestID: 911cb32e-aa9a-4a9a-97b8-84c0ee3f54a2, api error InvalidSignatureException: Signature expired: 20240419T130840Z is now earlier than 20240419T131027Z (20240419T131527Z - 5 min.)
2024-04-19T13:29:16.090Z [ERROR] provider.terraform-provider-aws_v5.46.0_x5: Response contains error diagnostic: tf_resource_type=aws_eks_addon tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=30b48a7e-2336-5dcb-bf9a-823b22f8edcc diagnostic_severity=ERROR tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_detail="" diagnostic_summary="waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: operation error EKS: DescribeAddon, https response error StatusCode: 0, RequestID: , request send failed, Get \"https://eks.us-east-1.amazonaws.com/clusters/ebs-winmng-eks/addons/aws-ebs-csi-driver\": dial tcp: lookup eks.us-east-1.amazonaws.com on 192.168.65.7:53: no such host" tf_proto_version=5.4 @module=sdk.proto timestamp=2024-04-19T13:29:16.090Z
2024-04-19T13:29:16.155Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: operation error EKS: DescribeAddon, https response error StatusCode: 0, RequestID: , request send failed, Get "https://eks.us-east-1.amazonaws.com/clusters/ebs-winmng-eks/addons/aws-ebs-csi-driver": dial tcp: lookup eks.us-east-1.amazonaws.com on 192.168.65.7:53: no such host
2024-04-19T13:35:53.312Z [ERROR] provider.terraform-provider-aws_v5.46.0_x5: Response contains error diagnostic: @module=sdk.proto diagnostic_summary="waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: context canceled" tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=b38029c7-0c06-bae0-2f92-6a5f6a79189c tf_rpc=ApplyResourceChange diagnostic_detail="" diagnostic_severity=ERROR tf_proto_version=5.4 tf_resource_type=aws_eks_addon @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:58 timestamp=2024-04-19T13:35:53.311Z
2024-04-19T13:35:53.390Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: context canceled
2024-04-19T13:35:53.391Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: execution halted
2024-04-19T13:35:53.391Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: execution halted
2024-04-19T15:03:21.067Z [ERROR] provider.terraform-provider-aws_v5.46.0_x5: Response contains error diagnostic: @module=sdk.proto diagnostic_summary="waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)" diagnostic_detail="" @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_severity=ERROR tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=98dcfc9c-853e-c14c-e8e7-0cfa2626007b tf_resource_type=aws_eks_addon tf_proto_version=5.4 tf_rpc=ApplyResourceChange timestamp=2024-04-19T15:03:21.065Z
2024-04-19T15:03:21.157Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
2024-04-19T15:21:15.005Z [ERROR] provider.terraform-provider-aws_v5.46.0_x5: Response contains error diagnostic: @module=sdk.proto tf_provider_addr=registry.terraform.io/hashicorp/aws tf_rpc=ApplyResourceChange tf_resource_type=aws_eks_addon diagnostic_detail="" diagnostic_summary="waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: context canceled" tf_proto_version=5.4 diagnostic_severity=ERROR tf_req_id=bdd6e006-8e8a-6107-f182-3675cd7bd1f4 @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:58 timestamp=2024-04-19T15:21:15.004Z
2024-04-19T15:21:15.087Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: context canceled
2024-04-19T15:21:15.088Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: execution halted
@carlosrodlop carlosrodlop changed the title [EBS CSI Driver] It is not compatible with Windows Managed Group [EBS CSI Driver] It is not compatible with Windows Managed Node Group Apr 20, 2024
Copy link

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label May 24, 2024
@carlosrodlop
Copy link
Author

carlosrodlop commented May 28, 2024

Has anyone looked into the provided example? This is issue still not answered either triaged to be removed.

@bryantbiggs
Copy link
Contributor

and what do the logs from the EBS CSI driver pod show you?

@github-actions github-actions bot removed the stale label May 29, 2024
@carlosrodlop
Copy link
Author

I will shortly in the net couple of days. Thanks for looking into this @bryantbiggs

@carlosrodlop
Copy link
Author

carlosrodlop commented Jun 6, 2024

@bryantbiggs thanks for your patience :)

Regarding ebs csi driver logs, they are not for ebs-csi-node-windows because they are in a PENDING state (ContainerCreating). The following Kubernetes event is connected to this issue:

(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a0fa2ff62abf7f2fb4b5b2ab7d9db59e11ba913c7b89ed0458cc650abb74701c": plugin type="vpc-bridge" name="vpc" failed (add): failed to parse Kubernetes args: failed to get pod IP address ebs-csi-node-windows-fzp2l: error executing k8s connector: error executing connector binary: exit status 1 with execution error: pod ebs-csi-node-windows-fzp2l does not have label vpc.amazonaws.com/PrivateIPv4Address

From the above description we can say that the issue appears to be related to the Amazon VPC CNI plugin failing to obtain the private IPv4 address for the Windows pod running the EBS CSI driver.

Questions

1.- I spoke to @wellsiau-aws about this issue and he pointed me out to this list of prerequisites https://github.com/kubernetes-sigs/aws-ebs-csi-driver/tree/master/examples/kubernetes/windows. Looking at the 4 points, I have my doubts on point 2 and 3. Do I need to add them in the provider main.tf somehow?

  • csi-proxy v1.0.0+ installed on the Windows node.
  • Driver v1.6.0+ from ECR: public.ecr.aws/ebs-csi-driver/aws-ebs-csi-driver:{driver version}. It can be built and pushed to another image registry with the command TAG=$MY_TAG REGISTRY=$MY_REGISTRY make all-push where MY_TAG refers to the image tag to push and MY_REGISTRY to the destination image registry like "XXXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com"

2.- Looking at https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html. I am wondering if https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html is configured correctly by the current configuration or is there something else we need to add for Windows Managed Nodes.

module "eks_blueprints_addons" {
  source = "aws-ia/eks-blueprints-addons/aws"
  version = "1.15.1"

...
  eks_addons = {
    aws-ebs-csi-driver = {
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
    }
...
  }

...
}

3.- I tried to look at the terraform code here https://github.com/aws-ia/terraform-aws-eks-blueprints-addons/blob/main/main.tf to understand what is happening under the scenes but there is not reference to ebs csi driver. Where should we look at the code for troubleshooting?

4.- Has anyone tried to run the *.tf files I provided? The issue is easy to reproduce locally I believe.

Resources status

Finally, I'm attaching a snapshot of all resource created and status

kubectl get all -A 
NAMESPACE     NAME                                     READY   STATUS              RESTARTS   AGE
kube-system   pod/aws-node-mvxg7                       2/2     Running             0          64m
kube-system   pod/aws-node-t74hb                       2/2     Running             0          64m
kube-system   pod/coredns-6777b4b9b9-jh6cf             1/1     Running             0          65m
kube-system   pod/coredns-6777b4b9b9-kvbmn             1/1     Running             0          65m
kube-system   pod/ebs-csi-controller-66cb49498-n92w2   6/6     Running             0          65m
kube-system   pod/ebs-csi-controller-66cb49498-ph2rd   6/6     Running             0          65m
kube-system   pod/ebs-csi-node-rhmfq                   3/3     Running             0          64m
kube-system   pod/ebs-csi-node-windows-fzp2l           0/3     ContainerCreating   0          57m
kube-system   pod/ebs-csi-node-xhjzv                   3/3     Running             0          64m
kube-system   pod/kube-proxy-69njv                     1/1     Running             0          64m
kube-system   pod/kube-proxy-ghpmf                     1/1     Running             0          64m

NAMESPACE     NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes   ClusterIP   172.20.0.1    <none>        443/TCP                  70m
kube-system   service/kube-dns     ClusterIP   172.20.0.10   <none>        53/UDP,53/TCP,9153/TCP   68m

NAMESPACE     NAME                                  DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR              AGE
kube-system   daemonset.apps/aws-node               2         2         2       2            2           <none>                     68m
kube-system   daemonset.apps/ebs-csi-node           2         2         2       2            2           kubernetes.io/os=linux     65m
kube-system   daemonset.apps/ebs-csi-node-windows   1         1         0       1            0           kubernetes.io/os=windows   65m
kube-system   daemonset.apps/kube-proxy             2         2         2       2            2           <none>                     68m

NAMESPACE     NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns              2/2     2            2           68m
kube-system   deployment.apps/ebs-csi-controller   2/2     2            2           65m

NAMESPACE     NAME                                           DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-6777b4b9b9             2         2         2       65m
kube-system   replicaset.apps/coredns-86969bccb4             0         0         0       68m
kube-system   replicaset.apps/ebs-csi-controller-66cb49498   2         2         2       m

@bryantbiggs
Copy link
Contributor

I would re-visit your configurations, theres a number of mis-configurations. For example:

  • You are setting a taint on the windows nodes - is there a toleration that matches on the EBS CSI driver?
  • I don't see where you have set node.enableWindows = true per the docs

@carlosrodlop
Copy link
Author

carlosrodlop commented Jun 6, 2024

Thanks @bryantbiggs for your reply

You are setting a taint on the windows nodes - is there a toleration that matches on the EBS CSI driver?

Nope! Where can I find the accepted inputs for eks_addons > ebs_driver. Ideally, I'd like to pass values with a yaml with the tolerations.

There is not reference to them either https://registry.terraform.io/modules/aws-ia/eks-blueprints-addon/aws/latest?tab=inputs neither https://aws-ia.github.io/terraform-aws-eks-blueprints-addons/main/

OK, I guess I can do something like this https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/3e9e5a13e7afee42d4b64874ba5adf73f329ff30/patterns/karpenter/main.tf#L117

Then adding tolerations like https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L276-L281

Can you confirm my suggestion please?

I don't see where you have set node.enableWindows = true per the docs

Which docs please?

Gotcha I need to enable this section https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L384 using the same approach I explained above

@carlosrodlop
Copy link
Author

I'm closing this issue it was solved by using node selectors only for Node Pools I want to use EBS CSI driver

module "eks_blueprints_addons" {
  source = "aws-ia/eks-blueprints-addons/aws"
  #vEKSBpAddonsTFMod#
  version = "1.15.1"
 ...
  eks_addons = {
    aws-ebs-csi-driver = {
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
      configuration_values = jsonencode(
        {
          node = {
            nodeSelector = {
              ebs_driver = "enabled"
            }
          }
        }
      )
    }
...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants