Cluster Security Group Change Issue #3014

sidewinder12s · 2024-04-17T00:09:04Z

Description

It appears I've hit this bug: #2469

But specifically I am already over the v18/v19 module upgrades. The only change we've made is that our VPC has added a CIDR block, which seems to be breaking the module with:

Plan:

  # module.cluster.module.eks.aws_security_group_rule.cluster["vpc"] must be replaced
+/- resource "aws_security_group_rule" "cluster" {
      ~ cidr_blocks              = [ # forces replacement
            # (2 unchanged elements hidden)
            "10.60.7.0/24",
          + "10.40.0.0/16",
        ]
      ~ id                       = "sgrule-422031136" -> (known after apply)
      - prefix_list_ids          = [] -> null
      + security_group_rule_id   = (known after apply)
      + source_security_group_id = (known after apply)
        # (7 unchanged attributes hidden)
    }

Apply:

module.cluster.module.eks.aws_security_group_rule.cluster["vpc"]: Creating...
╷
│ Error: [WARN] A duplicate Security Group rule was found on (sg-07c6df08897a15c50). This may be
│ a side effect of a now-fixed Terraform issue causing two security groups with
│ identical attributes but different source_security_group_ids to overwrite each
│ other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more
│ information and instructions for recovery. Error: InvalidPermission.Duplicate: the specified rule "peer: 10.35.192.0/18, TCP, from port: 443, to port: 443, ALLOW" already exists
│ 	status code: 400, request id: 299a0a30-3cd1-47b9-9192-26f9f0cfda35
│ 
│   with module.cluster.module.eks.aws_security_group_rule.cluster["vpc"],
│   on .terraform/modules/cluster.eks/main.tf line 295, in resource "aws_security_group_rule" "cluster":
│  295: resource "aws_security_group_rule" "cluster" {
│

So I am somewhat concerned this is not an issue specific to the module upgrades but a bug/limitation in the module.

If your request is for a new feature, please use the Feature request template.

✋ I have searched the open/closed issues and my issue is not listed.

Versions

Module version [Required]: 20.1.1
Terraform version: v1.3.6
Provider version(s): v5.35.0

Reproduction Code

Steps to reproduce the behavior:

Modify the incoming VPC to add another CIDR block:

  cluster_security_group_additional_rules = merge(
    {
      # Default Private Control Plane Access
      vpc = {
        description = "Allow VPC Network Access"
        protocol    = "tcp"
        from_port   = 443
        to_port     = 443
        cidr_blocks = data.aws_vpc.vpc.cidr_block_associations[*].cidr_block
        type        = "ingress"
      },
      private_network_access = {
        description = "Allow Private Network Access"
        protocol    = "tcp"
        from_port   = 443
        to_port     = 443
        cidr_blocks = ["10.0.0.0/8"]
        type        = "ingress"
      }
  }, var.cluster_security_group_additional_rules)

Is the issue potentially that we're using a computed value as the input to this variable in the module?

Minimal re-pro would probably be to set this cluster_security_group_additional_rules and then modify it or provide a computed value that changes/forces modification.

Expected behavior

Terraform updates the security group rules

Actual behavior

Terraform errors out on an existing cluster

Additional context

The text was updated successfully, but these errors were encountered:

isaccavalcante · 2024-04-22T14:04:15Z

I've been seeing this error a lot. All the other issues that I've seen with this are closed without solution. Theres the workaround of changing the IP ranges to force a change but thats not pratical. Another hacky way I found is to create a variable and append it to the key of the security group rules. For instance, in your case would be something like:

variable "security_group_update_trigger" {
  description = "Trigger updates without getting duplicate security group rule error"
  type        = bool
  default     = false
}

  cluster_security_group_additional_rules = merge(
    {
      # Default Private Control Plane Access
      "vpc_${var.security_group_update_trigger}" = {
        description = "Allow VPC Network Access"
        protocol    = "tcp"
        from_port   = 443
        to_port     = 443
        cidr_blocks = data.aws_vpc.vpc.cidr_block_associations[*].cidr_block
        type        = "ingress"
      },
      "private_network_access_${var.security_group_update_trigger}" = {
        description = "Allow Private Network Access"
        protocol    = "tcp"
        from_port   = 443
        to_port     = 443
        cidr_blocks = ["10.0.0.0/8"]
        type        = "ingress"
      }
  }, var.cluster_security_group_additional_rules)

But if you have a huge list of CIDRs like in my case, its bad since you dont have the easy tracking of changes when adding/removing a single cidr, because it will show as adding/removing all.

I hope we can have a proper fix for this issue.

bryantbiggs · 2024-04-22T14:20:16Z

I don't see how these relate to the module - this looks like you could easily reproduce this on just simple security group rule resources.

sidewinder12s · 2024-04-22T16:58:51Z

I guess I had opened this because it was not clear to me why the aws_security_group_rule was triggering a replace as opposed to just modifying it. It seemed like it was a side effect of however its been implemented in the module.

bryantbiggs · 2024-04-22T17:10:31Z

its being replaced due to changes that it has identified here:

~ cidr_blocks = [ # forces replacement
      # (2 unchanged elements hidden)
      "10.60.7.0/24",
    + "10.40.0.0/16",
  ]

And I suspect the duplicate rules is being thrown because both 10.60.7.0/24 and 10.40.0.0/16 are already captured within the rule

private_network_access = {
    description = "Allow Private Network Access"
    protocol    = "tcp"
    from_port   = 443
    to_port     = 443
    cidr_blocks = ["10.0.0.0/8"]
    type        = "ingress"
  }

sidewinder12s · 2024-04-22T18:20:04Z

I can confirm the second isn't an issue, we had multiple overlapping rules already in place within the rule that do overlap with 10.0.0.0/8 and should have failed if that was an EC2 SG limitation.

I was also able to manually add 10.40.0.0/16 within the EC2 UI without error, its just Terraform failing because it is attempting to create_before_destroy the aws_security_group_rule and the rule it attempts to create is already on the SG.

To get around this issue, since I already had that overlapping 10.0.0.0/8 rule defined in Terraform I actually just deleted all the other more specific rules Terraform was trying to append manually, removed them from state, then re-applied Terraform successfully.

I'd suspected something about how the variables are getting passed into the aws_security_group_rule resource was triggering the create_before_destroy: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/main.tf#L296-L299

But wasn't sure how to investigate further. Also wonder if this even newer resource might avoid this as well, especially since within the EC2 console, each CIDR block we're adding has its own unique security group rule ID already: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/vpc_security_group_ingress_rule
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/vpc_security_group_egress_rule

bryantbiggs · 2024-04-22T18:23:34Z

the newer individual rules (cannot pass a list of CIDRs, etc.) are recommended but, that comes at the cost of disruption to users. we have started to use those where possible, but we have to identify the right time to introduce those since it will be disruptive for users

github-actions · 2024-05-23T00:10:58Z

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions bot added the stale label May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Security Group Change Issue #3014

Cluster Security Group Change Issue #3014

sidewinder12s commented Apr 17, 2024 •

edited

isaccavalcante commented Apr 22, 2024

bryantbiggs commented Apr 22, 2024

sidewinder12s commented Apr 22, 2024

bryantbiggs commented Apr 22, 2024

sidewinder12s commented Apr 22, 2024

bryantbiggs commented Apr 22, 2024

github-actions bot commented May 23, 2024

Cluster Security Group Change Issue #3014

Cluster Security Group Change Issue #3014

Comments

sidewinder12s commented Apr 17, 2024 • edited

Description

Versions

Reproduction Code

Expected behavior

Actual behavior

Additional context

isaccavalcante commented Apr 22, 2024

bryantbiggs commented Apr 22, 2024

sidewinder12s commented Apr 22, 2024

bryantbiggs commented Apr 22, 2024

sidewinder12s commented Apr 22, 2024

bryantbiggs commented Apr 22, 2024

github-actions bot commented May 23, 2024

sidewinder12s commented Apr 17, 2024 •

edited