Intermittent "unexpected EOF" causes terraform to lose track of resources just created #2335

awood-prologis · 2024-03-22T19:24:07Z

Datadog Terraform Provider Version

v3.38.0

Terraform Version

v1.5.7

What resources or data sources are affected?

datadog_monitor resource

Terraform Configuration Files

terraform {
  required_providers {
    datadog = {
      source  = "DataDog/datadog"
    }
  }
}

variable "combined_backend_services" {
  type = map(map(string))
  default = {
    a = { id = "28757c79-0258-4df8-bbf3-c3c46dfaadb6" }
    b = { id = "01e40243-67a2-4913-8390-53d363ad944d" }
    c = { id = "5b328afa-a801-4611-acd1-8c1645bb6de1" }
    d = { id = "945417cd-9775-48c4-903f-5d147254ff1e" }
    e = { id = "e4f44c80-a3b5-4465-bb89-a893e057b111" }
  }
}

variable "Name" {
  type = string
  default = "EntAuth"
}

variable "Environment" {
  type = string
  default = "Dev"
}

variable "datadog_api_key" {
  type = string
}

variable "datadog_app_key" {
  type = string
}

locals {
  datadog_monitor_prefix = "DELETE ME - Scratch Work"
  datadog_notify_users_plus_opsgenie = "badvalue"
  default_datadog_notify_users = "@[email protected]"
}

provider "datadog" {
  api_key = var.datadog_api_key
  app_key = var.datadog_app_key
}

resource "datadog_monitor" "health_check_monitor" {
    name = "${local.datadog_monitor_prefix} | Health Check Failing"
    type = "metric alert"
    message = "One or more health checks in the primary region are failing, which will trigger an automated DR failover. ${var.Environment == "Prod" ? local.datadog_notify_users_plus_opsgenie : local.default_datadog_notify_users}"
    query = "min(last_5m):min:aws.route53.health_check_status{healthcheckid:${each.value}} < 1"
    for_each = toset([ for k, v in var.combined_backend_services: v.id ])
    monitor_thresholds {
        critical = 1
    }
}

resource "datadog_monitor" "health_check_failing" {
  name = "${local.datadog_monitor_prefix} | Automated DR failover"
  type = "event-v2 alert"
  message = "One or more health checks in the primary region are failing, which will trigger an automated DR failover. ${var.Environment == "Prod" ? local.datadog_notify_users_plus_opsgenie : local.default_datadog_notify_users}"
  query = "events(\"\\\"ALARM: \\\\\"${lower("${var.Name}-${var.Environment}")}-primary-unhealthy\\\\\"\\\"\").rollup(\"count\").last(\"5m\") > 0"
  monitor_thresholds {
    critical = 0
  }
}

Relevant debug or panic output

https://gist.github.com/awood-prologis/239b577fcf01e90f2b3719c982450396

Expected Behavior

I expected all resources created by Terraform when applying the given configuration to then be tracked by Terraform.
After running a destroy, and then going into the Datadog UI and manually deleting resources that Terraform created but did not track, I expect the next apply for the same configuration, with the same inputs, against an identical clean environment (i.e. no extant objects that will conflict with resources defined in the configuration) to produce the same results as the first run.

Actual Behavior

Terraform attempts to apply the provided configuration. During the run, it lists all resources to be created as expected, however a number of the resources report an error during creation because of "unexpected EOF". The number of resources that fail, and which ones, appears to vary from run to run, but in my testing there have always been at least one and less than six failures.
After the run is finished, referring to the Datadog monitor UI shows that all the planned resources reported by terraform were in fact created, and appear correct. However, subsequently running terraform show indicates that terraform does not know about the resources that it reported as failures, resumably because it did not write them into the state, believing the creation operation to have failed.
Attempting to run a plan after this point shows that terraform wishes to try again to create the "failed" resources - which fails if an apply is attempted, because the new resources have a namespace conflict in the DD API with the "forgotten" created resources.
Running a subsequent terraform destroy succeeds, but terraform cleans up only those resources it knows about. The rest must be cleaned up manually via the DD API.

Steps to Reproduce

terraform apply
terraform show
examine the Datadog monitor management UI to identify the newly-created resources in Datadog. It should include all resources reported by Terraform to have been successfully created, as well as all resources that were reported as failed due to "unexpected EOF".
terraform apply again
this will fail, as the resources that terraform believes failed do in fact exist, and cannot be created again

Important Factoids

No response

References

No response

The text was updated successfully, but these errors were encountered:

awood-prologis added the bug label Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent "unexpected EOF" causes terraform to lose track of resources just created #2335

Intermittent "unexpected EOF" causes terraform to lose track of resources just created #2335

awood-prologis commented Mar 22, 2024

Intermittent "unexpected EOF" causes terraform to lose track of resources just created #2335

Intermittent "unexpected EOF" causes terraform to lose track of resources just created #2335

Comments

awood-prologis commented Mar 22, 2024

Datadog Terraform Provider Version

Terraform Version

What resources or data sources are affected?

Terraform Configuration Files

Relevant debug or panic output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References