Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating an access entry fails if it already exists #2968

Closed
deshruch opened this issue Mar 11, 2024 · 22 comments
Closed

Creating an access entry fails if it already exists #2968

deshruch opened this issue Mar 11, 2024 · 22 comments

Comments

@deshruch
Copy link

deshruch commented Mar 11, 2024

Description

I am trying to create a new access entry. I am migrating from 19.20 -> 20.5.0 and so getting rid of config map entry and migrating to access entry: Creation of access entry fails if it already exists. I have to go manually delete the role so it attempts to create it again.
See Actual Behaviour for a full error message
Also for user defined roles such as the 'cluster_management_role' as shown in the terraform code - it sometimes fails to attach the policy. This results in failed deployment for us since we are using this role to for EKSTokenAuth.

  • [ yes] ✋ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Versions

  • Module version [Required]: 20.5.0

  • Terraform version:
    1.5.7

  • Provider version(s):
    5.38.0

Reproduction Code [Required]

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.5.0"

  cluster_name                         = var.eks_cluster_name
  cluster_version                      = var.eks_version
  cluster_endpoint_public_access       = true
  cluster_endpoint_private_access      = true
  cluster_endpoint_public_access_cidrs = var.public_access_cidrs
  enable_irsa                          = true
  iam_role_arn                         = aws_iam_role.eks_cluster_role.arn
  authentication_mode                  = "API_AND_CONFIG_MAP"
  vpc_id                               = local.vpc_id
  control_plane_subnet_ids             = local.eks_cluster_private_subnets
  subnet_ids                           = local.eks_worker_private_subnets

  cluster_security_group_tags = {
    "kubernetes.io/cluster/${var.eks_cluster_name}" = null
  }

  cluster_addons = {
    vpc-cni = {
      resolve_conflicts_on_update = "OVERWRITE"
      resolve_conflicts_on_create = "OVERWRITE"
      before_compute              = true
      service_account_role_arn    = module.vpc_cni_irsa.iam_role_arn
      addon_version               = local.eks_managed_add_on_versions.vpc_cni
      configuration_values = jsonencode({
        env = {
          # AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
          # ENI_CONFIG_LABEL_DEF               = "topology.kubernetes.io/zone"

          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
    coredns = {
      resolve_conflicts_on_update = "OVERWRITE"
      resolve_conflicts_on_create = "OVERWRITE"
      preserve                    = true #this is the default value
      addon_version               = local.eks_managed_add_on_versions.coredns

      timeouts = {
        create = "25m"
        delete = "10m"
      }
    }
    kube-proxy = {
      addon_version               = local.eks_managed_add_on_versions.kube_proxy
      resolve_conflicts_on_update = "OVERWRITE"
      resolve_conflicts_on_create = "OVERWRITE"
    }
    aws-ebs-csi-driver = {
      addon_version               = local.eks_managed_add_on_versions.aws_ebs_csi_driver
      resolve_conflicts_on_update = "OVERWRITE"
      resolve_conflicts_on_create = "OVERWRITE"
      service_account_role_arn    = aws_iam_role.ebs_csi_role.arn

    }
  }

  enable_cluster_creator_admin_permissions = true
  access_entries = {
    cluster_manager = {
      kubernetes_groups = [] #did not allow to add to system:masters, associating admin access policy
      principal_arn     = aws_iam_role.cluster_management_role.arn
      policy_associations = {
        cluster_manager = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            namespaces = []
            type = "cluster"
          }

        }

      }
    }

    mwaa = {
      kubernetes_groups = []
      principal_arn     = aws_iam_role.mwaa_execution_role.arn
      username          = "mwaa-service"
    }


  }


  node_security_group_additional_rules = {
    nodes_istiod_port = {
      description                   = "Cluster API to Node group for istiod webhook"
      protocol                      = "tcp"
      from_port                     = 15017
      to_port                       = 15017
      type                          = "ingress"
      source_cluster_security_group = true
    }
    node_to_node_communication = {
      description = "Allow full access for cross-node communication"
      protocol    = "tcp"
      from_port   = 0
      to_port     = 65535
      type        = "ingress"
      self        = true
    }
  }

  node_security_group_tags = {
    # NOTE - if creating multiple security groups with this module, only tag the
    # security group that Karpenter should utilize with the following tag
    # (i.e. - at most, only one security group should have this tag in your account)
    "karpenter.sh/discovery" = var.eks_cluster_name
  }

  eks_managed_node_group_defaults = {
    # We are using the IRSA created below for permissions
    # However, we have to provision a new cluster with the policy attached FIRST
    # before we can disable. Without this initial policy,
    # the VPC CNI fails to assign IPs and nodes cannot join the new cluster
    iam_role_attach_cni_policy = true
  }

  eks_managed_node_groups = {

    default = {
      name = "${var.eks_cluster_name}-default"

      subnet_ids = local.eks_worker_private_subnets

      min_size     = 2
      max_size     = 3
      desired_size = 2

      force_update_version = true
      instance_types       = ["m5a.xlarge"]

      # Not required nor used - avoid tagging two security groups with same tag as well
      create_security_group = false

      update_config = {
        max_unavailable_percentage = 50 # or set `max_unavailable`
      }

      description = "${var.eks_cluster_name} - EKS managed node group launch template"

      ebs_optimized           = true
      disable_api_termination = false
      enable_monitoring       = true

      block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs = {
            volume_size           = 75
            volume_type           = "gp3"
            iops                  = 3000
            throughput            = 150
            encrypted             = true
            delete_on_termination = true
          }
        }
      }

      metadata_options = {
        http_endpoint               = "enabled"
        http_tokens                 = "required"
        http_put_response_hop_limit = 2
        instance_metadata_tags      = "disabled"
      }

      create_iam_role = false
      iam_role_arn    = aws_iam_role.eks_node_group_role.arn
      # iam_role_name            = "${var.eks_cluster_name}-default-managed-node-group"
      # iam_role_use_name_prefix = false
      # iam_role_description     = "EKS managed node group role"
      # iam_role_additional_policies = {
      #   AmazonEC2ContainerRegistryReadOnly = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
      #   additional                         = aws_iam_policy.node_additional.arn
      # }

      tags = {
        EksClusterName = var.eks_cluster_name
      }
    }
  }

  tags = {
    # Explicit `nonsensitive()` call needed here as these tags are used in a foreach loop during deployment and foreach don't allow sensitive value
    nonsensitive(data.aws_ssm_parameter.appregistry_application_tag_key.value) = nonsensitive(data.aws_ssm_parameter.appregistry_application_tag_value.value)
    VPC_Name                                                                   = var.vpc_name
    Terraform                                                                  = "true"
  }
}

Steps to reproduce the behavior:
terraform init
terraform apply

Expected behavior

The cluster entry should be properly created even if it already exists.
Policy should be attached correctly

Actual behavior

The behaviour is very intermittent and unpredictable. It sometimes creates tge
We see error messages such as:

╷
│ Error: creating EKS Access Entry (second:arn:aws:iam::473699735501:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_AdministratorAccess_2dfe39b46fb1ea3a): operation error EKS: CreateAccessEntry, https response error StatusCode: 409, RequestID: 06e2b43a-e5a6-46f6-a05f-ed8b0887aa75, ResourceInUseException: The specified access entry resource is already in use on this cluster.
│
│   with module.eks.aws_eks_access_entry.this["cluster_creator"],
│   on .terraform/modules/eks/main.tf line 185, in resource "aws_eks_access_entry" "this":
│  185: resource "aws_eks_access_entry" "this" {
│
╵
╷
│ Error: creating EKS Access Entry (second:arn:aws:iam::473699735501:role/second-us-east-1-eks-node-group-role): operation error EKS: CreateAccessEntry, https response error StatusCode: 409, RequestID: 7f43c24f-361e-46cc-84e9-fe642dc622e0, ResourceInUseException: The specified access entry resource is already in use on this cluster.
│
│   with module.karpenter.aws_eks_access_entry.node[0],
│   on .terraform/modules/karpenter/modules/karpenter/main.tf line 589, in resource "aws_eks_access_entry" "node":
│  589: resource "aws_eks_access_entry" "node" {
│
╵
make: *** [Makefile:142: deploy-eks-cluster] Error 1

Actual behaviour when cluster_management_role custom role access entry fails to attach the policy
Plan: 18 to add, 2 to change, 13 to destroy.

╷
│ Error: query: failed to query with labels: secrets is forbidden: User "arn:aws:sts::473699735501:assumed-role/eks-second-us-east-1-cluster-management-role/EKSGetTokenAuth" cannot list resource "secrets" in API group "" in the namespace "karpenter"
│
│   with helm_release.karpenter,
│   on eks-add-ons.tf line 101, in resource "helm_release" "karpenter":
│  101: resource "helm_release" "karpenter" {
│
╵

Terminal Output Screenshot(s)

Additional context

@deshruch
Copy link
Author

@cweiblen Are you able to reproduce this issue as well?

@bryantbiggs
Copy link
Member

if you are migrating a cluster into cluster access entry, you can't use enable_cluster_creator_admin_permissions = true because EKS automatically maps that entity into an access entry. you can either remove this, or you can enable it but you'll need to import the entry that EKS created into the resource used by the module (to control this via Terraform)

@deshruch
Copy link
Author

deshruch commented Mar 11, 2024

bryantbiggs For the other issue where the policy is never attached:

cluster_manager = {
      kubernetes_groups = [] #did not allow to add to system:masters, associating admin access policy
      principal_arn     = aws_iam_role.cluster_management_role.arn
      policy_associations = {
        cluster_manager = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            namespaces = []
            type = "cluster"
          }

        }

      }
    }

I see this in the plan:


# module.eks.aws_eks_access_entry.this["cluster_manager"] will be created
  + resource "aws_eks_access_entry" "this" {
      + access_entry_arn  = (known after apply)
      + cluster_name      = "osdu5"
      + created_at        = (known after apply)
      + id                = (known after apply)
      + kubernetes_groups = (known after apply)
      + modified_at       = (known after apply)
      + principal_arn     = "arn:aws:iam::808560345837:role/eks-osdu5-us-east-1-cluster-management-role"
      + tags              = {
          + "Terraform"            = "true"
          + "VPC_Name"             = "osdu5"
        }
      + tags_all          = {
          + "Terraform"            = "true"
          + "VPC_Name"             = "osdu5"
        }
      + type              = "STANDARD"
      + user_name         = (known after apply)
    }


# module.eks.aws_eks_access_policy_association.this["cluster_manager_cluster_manager"] will be created
  + resource "aws_eks_access_policy_association" "this" {
      + associated_at = (known after apply)
      + cluster_name  = "osdu5"
      + id            = (known after apply)
      + modified_at   = (known after apply)
      + policy_arn    = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
      + principal_arn = "arn:aws:iam::808560345837:role/eks-osdu5-us-east-1-cluster-management-role"

      + access_scope {
          + type = "cluster"
        }
    }

Is this related with #2958?
If yes, what change do I need to make in my terraform code?

@deshruch deshruch reopened this Mar 11, 2024
@bryantbiggs
Copy link
Member

I don't follow, what is the issue?

@deshruch
Copy link
Author

In the reproduction code, see my access entry for principal_arn = aws_iam_role.cluster_management_role.arn
After terraform is applied, the access entry is created but it does not have the AmazonEKSClusterAdminPolicy attached to it.

@deshruch
Copy link
Author

See the second entry here:

image

@bryantbiggs
Copy link
Member

  1. What does the API say aws eks list-associated-access-policies --cluster-name <value> --principal-arn <value>
  2. Is your Terraform plan "clean" (i.e. - if you run terraform plan, its free of any diff/pending changes)

@cweiblen
Copy link

cweiblen commented Mar 12, 2024

Migrating an existing cluster from 19.20 -> 20.2, I was not able to get it working using access_entries input, I would get the errors described. As a workaround I used the aws_eks_access_entry from the AWS provider

@bryantbiggs
Copy link
Member

I would be curious to see what you are doing differently. If an access entry already exists, it already exists - there isn't anything unique about the implementation that would allow you to get around that

@deshruch
Copy link
Author

deshruch commented Mar 12, 2024

@bryantbiggs There are 2 issues that we see. when using access_entries
1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning
2/ If it does create the entry, it does not attach the policy.

I plan to attempt the same thing as @cweiblen mentioned. Move it out of eks module and add a separate access entry 'resource'

@bryantbiggs
Copy link
Member

1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning

We do not control this - this is the EKS API. Its stating that you can't have more than one entry for the same principal. This would be similar to trying to create two clusters both named the same, in the same region - the API does not allow that, nothing to do with this module

2/ If it does create the entry, it does not attach the policy.

Do you have a reproduction? I'd love to see whats different about a standalone resource versus whats defined here. Here is what we have in our example that works as intended

access_entries = {
# One access entry with a policy associated
ex-single = {
kubernetes_groups = []
principal_arn = aws_iam_role.this["single"].arn
policy_associations = {
single = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy"
access_scope = {
namespaces = ["default"]
type = "namespace"
}
}
}
}
# Example of adding multiple policies to a single access entry
ex-multiple = {
kubernetes_groups = []
principal_arn = aws_iam_role.this["multiple"].arn
policy_associations = {
ex-one = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSEditPolicy"
access_scope = {
namespaces = ["default"]
type = "namespace"
}
}
ex-two = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy"
access_scope = {
type = "cluster"
}
}
}
}
}

@deshruch
Copy link
Author

deshruch commented Mar 12, 2024

@bryantbiggs In my code I have an access entry of type 'cluster' as shown below:

In your example, ex-two is of type cluster but no 'policy_associations' section only a policy_arn.
Is that may be the problem with my code?

access_entries = {
    cluster_manager = {
      kubernetes_groups = [] #did not allow to add to system:masters, associating admin access policy
      principal_arn     = aws_iam_role.cluster_management_role.arn
      policy_associations = {
        cluster_manager = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            namespaces = []
            type = "cluster"
          }

        }

      }
    }

    mwaa = {
      kubernetes_groups = []
      principal_arn     = aws_iam_role.mwaa_execution_role.arn
      username          = "mwaa-service"
    }

Can you post an example of ex-single of type cluster with a policy association/policy_arn?
Probably the syntaxes are wrong?

@deshruch
Copy link
Author

Reg:

 1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning

We do not control this - this is the EKS API. Its stating that you can't have more than one entry for the same principal. This would be similar to trying to create two clusters both named the same, in the same region - the API does not allow that, nothing to do with this module

This is a problem when we are doing an upgrade. The first time we run it works fine, the second time you run it may be for an upgrade in another part of the code - it attempts to create it again. It should simply ignore if it already exists. But as you are saying its the EKS API and we need to log an issue there.

@bryantbiggs
Copy link
Member

The first time we run it works fine, the second time you run it may be for an upgrade in another part of the code - it attempts to create it again

From the details you have provided, its very hard to understand what you are doing and why you are encountering issues. I would suggest re-reading the upgrade guide. In short, there are two areas where access entries will already exist that YOU do not need to re-add them in code. Both of these scenarios are when you have a cluster that was created with the aws-auth ConfigMap and you are migrating to access entry:

  1. The identity that was used to create the cluster will automatically be mapped into an access entry when access entry is enabled on a cluster. Under the aws-auth ConfigMap only method, you would not see this identity in the ConfigMap. If you are using the same role that was used to create the cluster using aws-auth and you are migrating to access entry, you should not set enable_cluster_creator_admin_permissions = true because Terraform will try to create an access entry that EKS has already created and it will fail. If you wish to control this in code you will either need to manually delete the entry via the EKS API and then create with Terraform, or do a Terraform import to control this through code. We cannot do anything about this in the module since the module did not create it in the first place
  2. EKS will automatically create access entries for roles used by EKS managed nodegroup(s) and EKS Fargate profiles - users should NOT do anything with these cluster access entries when migrating to cluster access entry - leave these to EKS to manage. Again, if you try to re-add these entries through code/Terraform, it will fail and state that an entry already exists

@bryantbiggs
Copy link
Member

and for sake of completeness, here is an example as requested of a single entry with cluster scope as the module is currently written - it works without issue:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.8"

  cluster_name                   = local.name
  cluster_version                = local.cluster_version
  cluster_endpoint_public_access = true

  enable_cluster_creator_admin_permissions = true

  vpc_id                   = module.vpc.vpc_id
  subnet_ids               = module.vpc.private_subnets
  control_plane_subnet_ids = module.vpc.intra_subnets

  eks_managed_node_group_defaults = {
    ami_type       = "AL2_x86_64"
    instance_types = ["m6i.large", "m5.large", "m5n.large", "m5zn.large"]
  }

  eks_managed_node_groups = {
    # Default node group - as provided by AWS EKS
    default_node_group = {
      # By default, the module creates a launch template to ensure tags are propagated to instances, etc.,
      # so we need to disable it to use the default template provided by the AWS EKS managed node group service
      use_custom_launch_template = false
    }
  }

  access_entries = {
    # One access entry with a policy associated
    ex-single = {
      principal_arn     = aws_iam_role.this["single"].arn
      policy_associations = {
        ex = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy"
          access_scope = {
            type = "cluster"
          }
        }
      }
    }
  }

  tags = local.tags
}

Describe the access entry:

aws eks describe-access-entry \
  --cluster-name ex-eks-managed-node-group \
  --principal-arn "arn:aws:iam::000000000000:role/ex-single" \
  --region eu-west-1
{
    "accessEntry": {
        "clusterName": "ex-eks-managed-node-group",
        "principalArn": "arn:aws:iam::000000000000:role/ex-single",
        "kubernetesGroups": [],
        "accessEntryArn": "arn:aws:eks:eu-west-1:000000000000:access-entry/ex-eks-managed-node-group/role/000000000000/ex-single/40c71997-3891-aa1c-0997-e0352c7ca25a",
        "createdAt": "2024-03-12T11:01:05.685000-04:00",
        "modifiedAt": "2024-03-12T11:01:05.685000-04:00",
        "tags": {
            "GithubRepo": "terraform-aws-eks",
            "GithubOrg": "terraform-aws-modules",
            "Example": "ex-eks-managed-node-group"
        },
        "username": "arn:aws:sts::000000000000:assumed-role/ex-single/{{SessionName}}",
        "type": "STANDARD"
    }
}

List the policies associated with this principal:

aws eks list-associated-access-policies \
  --cluster-name ex-eks-managed-node-group \
  --principal-arn "arn:aws:iam::000000000000:role/ex-single" \
  --region eu-west-1
{
    "associatedAccessPolicies": [
        {
            "policyArn": "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy",
            "accessScope": {
                "type": "cluster",
                "namespaces": []
            },
            "associatedAt": "2024-03-12T11:01:07.063000-04:00",
            "modifiedAt": "2024-03-12T11:01:07.063000-04:00"
        }
    ],
    "clusterName": "ex-eks-managed-node-group",
    "principalArn": "arn:aws:iam::000000000000:role/ex-single"
}

@deshruch
Copy link
Author

Somehow the policy does not get attached in my case and in @cweiblen 's case as well.
Not sure whether it is the policy that we are using. I have shared my code, plan and a screenshot above

@bryantbiggs
Copy link
Member

I have shared my code, plan and a screenshot above

You have shared some code, yes, but its all variables and values that are unknown to anyone but yourself. For now I am putting a pin in this thread because I am not seeing any issues on the module as it stands. If there is additional information that will highlight this issue, we can definitely take a another look

@bilalahmad99
Copy link

We faced the same issue that @deshruch mentioned. e.g.

1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning
2/ If it does create the entry, it does not attach the policy.

and we have this enable_cluster_creator_admin_permissions set as False

Exact error was:
creating EKS Access Entry (): operation error EKS: CreateAccessEntry, https response error StatusCode: 409, RequestID: xxx, ResourceInUseException: The specified access entry resource is already in use on this cluster

We had to manually intervene and delete that entry or attach policy.

@deshruch
Copy link
Author

We had to do the same thing that @cweiblen did to get around this. We had to create access entries using 'resource'. Note that this was the case for a custom IAM role that we were migrating form Config Map to EKS access entry.

However, if this is for the node group role, EKS module automatically moves it. We were also using the 'karpenter' module in which you need to explicitly set create_access_entry = false (default is true), s o that the karpenter module does not try to recreate it again and throw the 'the specified access entry resource is already in use on this cluster' error.

For user defined/custom IAM role, we had to add access entry and policy association using 'resource'

@mconigliaro
Copy link

In case anyone needs to import the existing access entry:

$ terraform import 'module.cluster_name.module.eks.aws_eks_access_entry.this["cluster_creator"]' cluster_name:principal_arn
$ terraform import 'module.cluster_name.module.eks.aws_eks_access_policy_association.this["cluster_creator_admin"]' cluster_name#principal_arn#policy_arn

Copy link

github-actions bot commented May 3, 2024

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label May 3, 2024
Copy link

This issue was automatically closed because of stale in 10 days

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants