Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

P1 total data loss bug in google_storage_bucket lifecycle rule for days_since_noncurrent_time #17990

Comments

@philip-harvey
Copy link

philip-harvey commented Apr 30, 2024

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to a user, that user is claiming responsibility for the issue.
  • Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version

Terraform v1.8.2
google v5.26.0

Affected Resource(s)

google_storage_bucket

Terraform Configuration

resource "google_storage_bucket" "somename-test" {
  name                        = "somename-test"
  project                     = projectid
  location                    = "US-CENTRAL1"
  uniform_bucket_level_access = true
  versioning {
    enabled = true
  }
  lifecycle_rule {
    action {
      type = "Delete"
    }
    condition {
      days_since_noncurrent_time = 0
    }
  }
}

Debug Output

No response

Expected Behavior

This should create a lifecyle rule to delete noncurrent objects after 0 days

Actual Behavior

Creates a lifecycle rule that deletes ALL object in the bucket after 0 days, causing total data loss in the bucket

Steps to reproduce

  1. terraform apply

Important Factoids

This is a P1 issue causing total data loss

References

No response

b/339089840

@github-actions github-actions bot added forward/review In review; remove label to forward service/storage labels Apr 30, 2024
@ggtisc ggtisc self-assigned this May 6, 2024
@ggtisc
Copy link
Collaborator

ggtisc commented May 6, 2024

Hi @philip-harvey!

According to your configuration and the description on the official documentation of the property lifecycle.rule[].condition.daysSinceNoncurrentTime if you set this value to 0 then the normal is that all resources in your bucket will be deleted after 0 days. Because all the resources are going to be recognized as noncurrent immediately while the value of the condition is equal to 0. Then there is no bug, this is a normal harassment.

@philip-harvey
Copy link
Author

Hi @philip-harvey!

According to your configuration and the description on the official documentation of the property lifecycle.rule[].condition.daysSinceNoncurrentTime if you set this value to 0 then the normal is that all resources in your bucket will be deleted after 0 days. Because all the resources are going to be recognized as noncurrent immediately while the value of the condition is equal to 0. Then there is no bug, this is a normal harassment.

I'm not following the logic there, if you set this via the console it correctly sets the rule to delete objects 0 days after they become noncurrent, which is correct and expected, but if you set this via Terraform it makes a totally different rule, that just deletes ALL objects in the bucket, both current and noncurrent objects. The documentation doesn't call out this behavior anywhere that I can find, and it's inconsistent with how this works in the GUI, and causes total data loss.

@ggtisc
Copy link
Collaborator

ggtisc commented May 6, 2024

This could be a documentation update of the description of this field (lifecycle.rule[].condition.daysSinceNoncurrentTime) in the API.

@ggtisc ggtisc added documentation and removed bug forward/review In review; remove label to forward labels May 6, 2024
@philip-harvey
Copy link
Author

This could be a documentation update of the description of this field (lifecycle.rule[].condition.daysSinceNoncurrentTime) in the API.

I'm still not following this, the documentation says ". This condition is satisfied when an object has been noncurrent for more than the specified number of days.". The bug is that Terraform creates a lifecycle rule that does not do this, it creates a lifecycle rule that deletes ALL objects, not just noncurrent ones. As I said previously, this works correctly outside of Terraform and it's a very severe bug that causes total data loss.

@philip-harvey
Copy link
Author

This is not a documentation issue, this is a bug, a very major one.

@kautikdk
Copy link

kautikdk commented May 7, 2024

I think, This issue is coming because of the provider bug that is already solved: #14044 .
You can try setting no_age = true in the lifecycle config block to avoid creating unexpected age rule.
This is the discussion link of the solution approach:GoogleCloudPlatform/magic-modules#9512. The problem was with terraform's way of handling zero value of the data types. That's why the solution is implemented in this way.

@philip-harvey
Copy link
Author

#14044 looks like a similar bug, but this bug is not yet fixed in the provider.

@philip-harvey
Copy link
Author

Actually it looks like #14044 is also not fixed

@kautikdk
Copy link

kautikdk commented May 7, 2024

It is! You can try putting no_age = true in the lifecycle rule condition block, It won't add unexpected condition of age.

@rileykarson
Copy link
Collaborator

rileykarson commented May 7, 2024

Because all the resources are going to be recognized as noncurrent immediately while the value of the condition is equal to 0.

This shouldn't be the case- objects are determined to be noncurrent if they are replaced by a newer version, per https://cloud.google.com/storage/docs/object-versioning#intro. Live objects should never be replaced by this rule (at least, reading this as a caller of the API).

I suspect this is a side effect of #14044 w/o no_age set. Since Terraform things age has a value of zero, we're sending that, and there are two values in the rule. That still shouldn't delete current objects as all the rules won't apply- current objects are not noncurrent, and it can't be zero days since they became noncurrent.

Our effective rule as a result is:

  lifecycle_rule {
    action {
      type = "Delete"
    }
    condition {
      age = 0
      days_since_noncurrent_time = 0
    }
  }

@philip-harvey do you have the REST response from a Get call on the bucket, specifically the lifecycle block? That's available in the provider's debug logs or you can use the API explorer sidebar at https://cloud.google.com/storage/docs/json_api/v1/buckets/get (or the dozen other ways to call the APIs). Terraform state will also probably be correct, but its inability to handle nil values can make it more confusing than just REST in these zero-versus-nil cases.

@philip-harvey
Copy link
Author

philip-harvey commented May 7, 2024

{
  "kind": "storage#bucket",
  "selfLink": "https://www.googleapis.com/storage/v1/b/redacted",
  "id": "redacted",
  "name": "redacted",
  "projectNumber": "redacted",
  "metageneration": "6",
  "location": "US-CENTRAL1",
  "storageClass": "STANDARD",
  "etag": "CAY=",
  "timeCreated": "2024-04-30T15:41:26.753Z",
  "updated": "2024-04-30T15:55:46.810Z",
  "versioning": {
    "enabled": true
  },
  "lifecycle": {
    "rule": [
      {
        "action": {
          "type": "Delete"
        },
        "condition": {
          "age": 0
        }
      }
    ]
  },
  "softDeletePolicy": {
    "retentionDurationSeconds": "604800",
    "effectiveTime": "2024-04-30T15:41:26.753Z"
  },
  "iamConfiguration": {
    "bucketPolicyOnly": {
      "enabled": true,
      "lockedTime": "2024-07-29T15:41:26.753Z"
    },
    "uniformBucketLevelAccess": {
      "enabled": true,
      "lockedTime": "2024-07-29T15:41:26.753Z"
    },
    "publicAccessPrevention": "inherited"
  },
  "locationType": "region"
}

@rileykarson
Copy link
Collaborator

rileykarson commented May 7, 2024

Thanks! That's much worse, we're not sending days_since_noncurrent_time at all when it has a value of zero. Combined with age which gets sent when it has a value of zero, it leads to this result where all objects get deleted.

(fyi: I edited your comment to add a code fence, to make it easier to read)

@rileykarson rileykarson assigned NickElliot and unassigned ggtisc May 7, 2024
@philip-harvey
Copy link
Author

Thanks! That's much worse, we're not sending days_since_noncurrent_time at all when it has a value of zero. Combined with age which gets sent when it has a value of zero, it leads to this result where all objects get deleted.

(fyi: I edited your comment to add a code fence, to make it easier to read)

I wondered if you send both age = 0 and days_since_noncurrent_time = 0 if the API just ignores the days_since_noncurrent_time = 0 part since it's redundant, but I haven't verified this.

@rileykarson
Copy link
Collaborator

That works fine:

  "lifecycle": {
    "rule": [
      {
        "action": {
          "type": "Delete"
        },
        "condition": {
          "age": 0,
          "daysSinceNoncurrentTime": 0
        }
      }
    ]
  },

@rileykarson
Copy link
Collaborator

@philip-harvey we're shipping a docs change immediately while we figure out the longer-term fix here (@NickElliot and @kautikdk are discussing offline). Mind giving GoogleCloudPlatform/magic-modules#10626 a once-over to see if you think it would have helped in your case / make any suggestions if not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment