Implement BoundPredicateVisitor trait for ManifestFilterVisitor #367

s-akhtar-baig · 2024-05-09T19:41:55Z

GitHub issue: #350

Description: ManifestEvaluator was implemented in #322 whereas some functions were unimplemented. This PR implements the remaining functions and adds most of the Python unit tests.

Testing: Added new unit tests.

…sitor

sdd

Looks great, thanks so much for the contribution! Just a few small issues that are straightforward to resolve. 🙌🏼

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

s-akhtar-baig · 2024-05-10T20:42:52Z

@sdd, thank you for reviewing the changes and providing references! I have modified my code based on your suggestions. Please take a look and let me know if I miss anything.

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

sdd

We're almost there! Just a couple of small stylistic changes required and then I'm happy.

Thanks again! 😁

sdd · 2024-05-13T06:50:03Z

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

+ return ROWS_MIGHT_MATCH;
+ }
+
+ if let Some(Literal::Primitive(lower_bound)) = &field.lower_bound {


This is much cleaner! Thanks :-)

marvinlanhenke

Thanks for this PR @s-akhtar-baig which looks really good - left some minor comments. However, please check the comment about 'comparison' and the and implementation and verify its correct.

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

marvinlanhenke · 2024-05-13T06:34:45Z

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

+ return ROWS_CANNOT_MATCH;
+ }
+
+ if self.are_all_null(field, &reference.field().field_type) {


isn't this check redundant? If the partition contains no NaN values, we don't need to check if all values are null. If all values are null it cannot contain any NaN values - but we already know that.

Please refer to @sdd's comment #367 (comment).

@marvinlanhenke if field.contains_nan is None rather than Some(false) then this implies that the metrics don't indicate if the fields contain NaN or not. So the subsequent check for all nulls is still valid?

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

marvinlanhenke · 2024-05-13T07:44:02Z

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

+
+ let prefix_len = prefix.chars().count();
+
+ if let Some(lower_bound) = &field.lower_bound {


suggestion: extract into helper fn, since its used in multiple places?

marvinlanhenke · 2024-05-13T07:50:53Z

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

+ return ROWS_MIGHT_MATCH;
+ }
+
+ let truncated_upper_bound =


I think we can avoid the extra String allocation by using char_indices() to get the prefix_len and then use a slice for comparison let truncated_upper_bound = &upper_bound[..prefix_len]; // haven't tested it though

Please refer to @sdd's comment #367 (comment).

I think @marvinlanhenke's suggestion is indeed a better one than mine here

s-akhtar-baig

@marvinlanhenke @sdd, thank you for reviewing. I have modified the code accordingly and added comments for clarification. Let me know what you think.

s-akhtar-baig · 2024-05-15T21:30:56Z

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

+ return ROWS_MIGHT_MATCH;
+ }
+
+ let truncated_upper_bound =


Please refer to @sdd's comment #367 (comment).

s-akhtar-baig · 2024-05-15T21:32:58Z

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

+ return ROWS_CANNOT_MATCH;
+ }
+
+ if self.are_all_null(field, &reference.field().field_type) {


Please refer to @sdd's comment #367 (comment).

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

liurenjie1024

Thanks @s-akhtar-baig for this great pr, it looks great! I left some questions about the confusing part. Also I think one important thing is that we should not rely one the Ord of PrimitiveLiteral.

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

liurenjie1024 · 2024-05-23T14:15:22Z

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

+ let field = self.field_summary_for_reference(reference);
+
+ if field.lower_bound.is_none() || field.upper_bound.is_none() {
+ return ROWS_CANNOT_MATCH;


Why it's ROWS_CANNOT_MATCH? I think if either is missing, we can't exclue it?

@liurenjie1024, I followed Python implementation https://github.com/apache/iceberg-python/blob/20f6afdf5f000ea5b167e804012f2000aa5b8573/pyiceberg/expressions/visitors.py#L639.

Please let me know if this is incorrect and if there is a different spec that I needed to follow.

liurenjie1024 · 2024-05-23T14:17:57Z

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

 _predicate: &BoundPredicate,
 ) -> crate::Result<bool> {
- todo!()
+ // because the bounds are not necessarily a min or max value, this cannot be answered using


I'm a little confusing here, why lower/upper bound are not necessarily min/max value?

Same comment here, followed https://github.com/apache/iceberg-python/blob/20f6afdf5f000ea5b167e804012f2000aa5b8573/pyiceberg/expressions/visitors.py#L658.

liurenjie1024 · 2024-05-23T14:19:17Z

crates/iceberg/src/expr/visitors/manifest_evaluator.rs

+ let field = self.field_summary_for_reference(reference);
+
+ if field.lower_bound.is_none() || field.upper_bound.is_none() {
+ return ROWS_CANNOT_MATCH;


As above, why either is none, we can't match it?

Same comment here. Followed https://github.com/apache/iceberg-python/blob/20f6afdf5f000ea5b167e804012f2000aa5b8573/pyiceberg/expressions/visitors.py#L731. Collapsed if statements on L722 and L731.

s-akhtar-baig added 3 commits May 9, 2024 12:19

Implement all functions of BoundPredicateVisitor for ManifestFilterVi…

aee9f52

…sitor

Merge branch 'main' into impl_boundpredvisitor_traits

7876203

Fix code comments

0371fd4

s-akhtar-baig mentioned this pull request May 9, 2024

Implement all functions of BoundPredicateVisitor for ManifestFilterVisitor #350

Open

s-akhtar-baig added 2 commits May 9, 2024 16:09

Refactor code and fixpredicate for is_some_and

965b6b2

Refactor code

766a4c4

sdd suggested changes May 10, 2024

View reviewed changes

Handle review comments

0044893

s-akhtar-baig requested a review from sdd May 10, 2024 20:42

sdd reviewed May 13, 2024

View reviewed changes

crates/iceberg/src/expr/visitors/manifest_evaluator.rs Outdated Show resolved Hide resolved

sdd reviewed May 13, 2024

View reviewed changes

crates/iceberg/src/expr/visitors/manifest_evaluator.rs Outdated Show resolved Hide resolved

sdd reviewed May 13, 2024

View reviewed changes

crates/iceberg/src/expr/visitors/manifest_evaluator.rs Outdated Show resolved Hide resolved

sdd suggested changes May 13, 2024

View reviewed changes

sdd reviewed May 13, 2024

View reviewed changes

marvinlanhenke suggested changes May 13, 2024

View reviewed changes

Handle review comments

552f59b

s-akhtar-baig commented May 15, 2024

View reviewed changes

liurenjie1024 reviewed May 23, 2024

View reviewed changes

liurenjie1024 mentioned this pull request May 23, 2024

bug: PrimitiveLiteral and Literal should not be Ord. #378

Open

Fokko self-requested a review May 27, 2024 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement BoundPredicateVisitor trait for ManifestFilterVisitor #367

Implement BoundPredicateVisitor trait for ManifestFilterVisitor #367

s-akhtar-baig commented May 9, 2024 •

edited

sdd left a comment

s-akhtar-baig commented May 10, 2024

sdd left a comment

sdd May 13, 2024

marvinlanhenke left a comment

marvinlanhenke May 13, 2024

s-akhtar-baig May 15, 2024

sdd May 22, 2024

marvinlanhenke May 13, 2024

marvinlanhenke May 13, 2024

s-akhtar-baig May 15, 2024

sdd May 22, 2024 •

edited

s-akhtar-baig left a comment

s-akhtar-baig May 15, 2024

s-akhtar-baig May 15, 2024

liurenjie1024 left a comment

liurenjie1024 May 23, 2024

s-akhtar-baig May 31, 2024 •

edited

liurenjie1024 May 23, 2024

s-akhtar-baig May 31, 2024

liurenjie1024 May 23, 2024

s-akhtar-baig May 31, 2024


		let prefix_len = prefix.chars().count();

		if let Some(lower_bound) = &field.lower_bound {

Implement BoundPredicateVisitor trait for ManifestFilterVisitor #367

Are you sure you want to change the base?

Implement BoundPredicateVisitor trait for ManifestFilterVisitor #367

Conversation

s-akhtar-baig commented May 9, 2024 • edited

sdd left a comment

Choose a reason for hiding this comment

s-akhtar-baig commented May 10, 2024

sdd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marvinlanhenke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdd May 22, 2024 • edited

Choose a reason for hiding this comment

s-akhtar-baig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liurenjie1024 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s-akhtar-baig May 31, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s-akhtar-baig commented May 9, 2024 •

edited

sdd May 22, 2024 •

edited

s-akhtar-baig May 31, 2024 •

edited