Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery table require_partition_filter.patch #1733

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

600lyy
Copy link
Member

@600lyy 600lyy commented May 8, 2024

require_partition_filter.patch

Make require_partition_filter a top level field.
This will replace the same field in the time_partitioning.

Fixes 1654

  • Run make ready-pr to ensure this PR is ready for review.
  • Perform necessary E2E testing for changed resources.

@yuwenma
Copy link
Collaborator

yuwenma commented May 17, 2024

/lgtm

Thanks for the contribution, @600lyy PR looks good to me.
I'll defer to @maqiuyujoyce for kcc-tf requirements.

Copy link
Collaborator

@maqiuyujoyce maqiuyujoyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @600lyy , thank you for supporting the new field! It looks like a slightly tricky deprecation use case. I verified that the two requirePartitionFilter field can both be configured at the same time with different values, and the POST API calls captures the values properly - is it the expected behavior? If not, additional logic might be needed to do the check.

@600lyy
Copy link
Member Author

600lyy commented May 22, 2024

Hi @600lyy , thank you for supporting the new field! It looks like a slightly tricky deprecation use case. I verified that the two requirePartitionFilter field can both be configured at the same time with different values, and the POST API calls captures the values properly - is it the expected behavior? If not, additional logic might be needed to do the check.

Thanks @maqiuyujoyce for your comment.
Where did you set the two requirePartitionFilter fields?

@maqiuyujoyce
Copy link
Collaborator

This is the test yaml I used (you'll want to replace ${uniqueId} and ${projectId}):

apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryTable
metadata:
  name: bigquerytablesample${uniqueId}
  annotations:
    cnrm.cloud.google.com/project-id: ${projectId}
spec:
  friendlyName: bigquerytable-sample
  datasetRef:
    name: bigquerydatasetsample${uniqueId}
  requirePartitionFilter: false
  timePartitioning:
    type: DAY
    requirePartitionFilter: true

@600lyy
Copy link
Member Author

600lyy commented May 24, 2024

Those two fields with the same name are treated as conflicting fields in terraform

~/workspace/tmp/terraform-test ❯ terraform apply                                                                              10776 11:36:14
╷
│ Error: Conflicting configuration arguments
│
│   with google_bigquery_table.default,
│   on main.tf line 17, in resource "google_bigquery_table" "default":
│   17:   require_partition_filter = false
│
│ "require_partition_filter": conflicts with time_partitioning.0.require_partition_filter

But KCC currently seems to allow for conflicting fields to exist in the same object so I'm not sure where to add that additional check?
Should I move timePartition. requirePartitionFilter to ignoredFields as it is due to deprecate?

@maqiuyujoyce
Copy link
Collaborator

But KCC currently seems to allow for conflicting fields to exist in the same object so I'm not sure where to add that additional check?

The conflicting fields are at TF-level but not KCC-level. KCC will respect the CRD instead of TF schema.
It can probably be done as an additional feature in KCC (i.e. supporting the generation of the conflicting schema in CRD based on TF schema), or we may implement the check in the local TF lib. Let's discuss offline for further details.

Should I move timePartition. requirePartitionFilter to ignoredFields as it is due to deprecate?

Unfortunately, no. BigQueryTable is a v1beta1 CRD that we need to ensure backwards compatibility of. Removing a field (ignoring the field will remove it from the CRD) is a breaking change that we don't allow within the same version.

Make require_partition_filter a top level field.
This will replace the same field in the time_partitioning
@600lyy 600lyy force-pushed the bigquery-requirepartitionfilter branch from eece362 to 39708c9 Compare May 31, 2024 07:42
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@google-oss-prow google-oss-prow bot removed the lgtm label May 31, 2024
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from maqiuyujoyce. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@600lyy
Copy link
Member Author

600lyy commented May 31, 2024

Hi @maqiuyujoyce ,
I was trying to put check in the local TF provider

func expandTimePartitioning(configured interface{}, use_old_rpf bool) (*bigquery.TimePartitioning, error) {
	raw := configured.([]interface{})[0].(map[string]interface{})
	tp := &bigquery.TimePartitioning{Type: raw["type"].(string)}

	if v, ok := raw["field"]; ok {
		tp.Field = v.(string)
	}

	if v, ok := raw["expiration_ms"]; ok {
		// fmt.Printf("expiration in time partition is %d", v)
		tp.ExpirationMs = int64(v.(int))
	}

	if v, ok := raw["require_partition_filter"]; ok {
		if use_old_rpf {
			tp.RequirePartitionFilter = v.(bool)
		} else {
			return nil, fmt.Errorf("partition filter in time partition is not allowed when the top level filter is used")
		}
	}

	return tp, nil
}

During testing, I noted that raw["require_partition_filter"] always gets a default value (false) when creating a new table even if the field is omitted in the KCC YAML. Do you know whether it is a KCC behaviour of adding zero value to fields in a list object? ( e.g. expiration_ms also got assigned a value of 0 if not specified).

With this limitation, it is very hard to resolve the conflict since we don't know if the false value is set by users or by KCC.

@maqiuyujoyce
Copy link
Collaborator

Do you know whether it is a KCC behaviour of adding zero value to fields in a list object?

I believe not.

I verified by printing out the config (desired state) and diff with the following YAML:

apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryTable
metadata:
  name: bigquerytablesample${uniqueId}
  annotations:
    cnrm.cloud.google.com/project-id: ${projectId}
spec:
  friendlyName: bigquerytable-sample
  datasetRef:
    name: bigquerydatasetsample${uniqueId}
 # requirePartitionFilter: false
  timePartitioning:
    type: DAY
    requirePartitionFilter: true

And neither of them has the commented-out, top-level requirePartitionFilter field.

Config:

&{ComputedKeys:[] Raw:map[dataset_id:bigquerydatasetsamplezpcrpgtojmaqyrykhxea friendly_name:bigquerytable-sample 
labels:map[cnrm-test:true managed-by-cnrm:true] project:project-id 
table_id:bigquerytablesamplezpcrpgtojmaqyrykhxea time_partitioning:[map[require_partition_filter:true type:DAY]]] 
Config:map[dataset_id:bigquerydatasetsamplezpcrpgtojmaqyrykhxea friendly_name:bigquerytable-sample 
labels:map[cnrm-test:true managed-by-cnrm:true] project:project-id 
table_id:bigquerytablesamplezpcrpgtojmaqyrykhxea time_partitioning:[map[require_partition_filter:true type:DAY]]]}

Diff:

&{mu:{state:0 sema:0} Attributes:map[creation_time:0xc00921cec0 dataset_id:0xc00921d400 
deletion_protection:0xc00921d140 etag:0xc00921d380 expiration_time:0xc00921d300 friendly_name:0xc00921d340 
labels.%:0xc00921cf00 labels.cnrm-test:0xc00921cf40 labels.managed-by-cnrm:0xc00921cf80 
last_modified_time:0xc00921cfc0 location:0xc00921d100 num_bytes:0xc00921d3c0 
num_long_term_bytes:0xc00921d180 num_rows:0xc00921d240 project:0xc00921d2c0 self_link:0xc00921d1c0 
table_id:0xc00921d280 time_partitioning.#:0xc00921d000 time_partitioning.0.expiration_ms:0xc00921d040 
time_partitioning.0.require_partition_filter:0xc00921d0c0 time_partitioning.0.type:0xc00921d080 type:0xc00921d200] 
Destroy:false DestroyDeposed:false DestroyTainted:false RawConfig:{ty:{typeImpl:<nil>} v:<nil>} RawState:{ty:{typeImpl:
<nil>} v:<nil>} RawPlan:{ty:{typeImpl:<nil>} v:<nil>} Meta:map[]}

Same thing happens when I commented out timePartitioning.requirePartitionFilter. So I believe the defaulted empty values are from TF itself.

During testing, I noted that raw["require_partition_filter"] always gets a default value (false) when creating a new table even if the field is omitted in the KCC YAML.

Just to double check, have you verified that the ok variable is true?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KCC support for BigLake tables
3 participants