Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine ut framework including Part 1 and Part 2 #10861

Merged

Conversation

binmahone
Copy link
Collaborator

@binmahone binmahone commented May 22, 2024

This PR is a superset of #10851, it closes #10850 and #10859

Compared to #10851 . it has one more additional commit to distinguish two modes for expression based suites: vectorized parameter mode and scalar parameter mode.

For RAPIDS expressions that only support scalar parameters (litOnlyTypes) , we have to use scalar parameter mode. For others we prefer to use vectorized parameter mode.

In case I can't get approval from reviewer because of the additional commit before 24.06 code freeze, we can merge #10851 , otherwise merge this PR.

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone binmahone requested a review from jlowe May 22, 2024 07:00
@binmahone binmahone added the test Only impacts tests label May 22, 2024
@binmahone
Copy link
Collaborator Author

build

@binmahone binmahone requested a review from revans2 May 22, 2024 07:03
@binmahone binmahone changed the title refine ut framework including Part 2 refine ut framework including Part 1 and Part 2 May 22, 2024
import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, MapData, TypeUtils}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.rapids.utils.RapidsQueryTestUtil.isNaNOrInf
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.UTF8String


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -121,6 +121,12 @@ object RapidsSQLTestsBaseTrait {
"org.apache.spark.sql.rapids.ExecutionPlanCaptureCallback")
.set("spark.sql.warehouse.dir", warehouse)
.set("spark.sql.cache.serializer", "com.nvidia.spark.ParquetCachedBatchSerializer")
.set("spark.sql.session.timeZone", "UTC")
.set("spark.rapids.sql.explain", "ALL")
// uncomment below config to run `strict mode`, where fallback to CPU is treated as fail
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we want a parameterized test suit here for both strict mode and non-strict mode associated with different RapidsTestSettings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot more test cases will fail in strict mode as shown in our report, and we'll need even more resource to take care of the failed test cases in strict mode.

Now that we're still struggling with non-strict mode, the result of strict mode is more of a reference than a guide to our daily priority. So it's not mandatory to do so at present? we can add that anytime when we have enough resource to deal with failed test cases in strict mode .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should consider how to parallel execute these two modes to avoid CI taking too much time

@binmahone binmahone force-pushed the 240521_refine_ut_framework_p2_2 branch from 1072302 to c612243 Compare May 22, 2024 08:41
@binmahone
Copy link
Collaborator Author

build

@binmahone binmahone force-pushed the 240521_refine_ut_framework_p2_2 branch from c612243 to c0ed014 Compare May 22, 2024 08:43
@binmahone
Copy link
Collaborator Author

build

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone binmahone force-pushed the 240521_refine_ut_framework_p2_2 branch from c0ed014 to ada5334 Compare May 22, 2024 09:11
@binmahone
Copy link
Collaborator Author

build

@jlowe
Copy link
Member

jlowe commented May 22, 2024

Having two PRs pending that do very similar things is confusing, especially when one of them already has a lot of active comments. Why not post a separate PR that just has the last commit? Yes, there's a small amount of conflicts to resolve when one goes in before the other, but fairly straightforward to resolve. As it is now, this could get approved and merged without addressing all of the comments in #10851.

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone
Copy link
Collaborator Author

Having two PRs pending that do very similar things is confusing, especially when one of them already has a lot of active comments. Why not post a separate PR that just has the last commit? Yes, there's a small amount of conflicts to resolve when one goes in before the other, but fairly straightforward to resolve. As it is now, this could get approved and merged without addressing all of the comments in #10851.

Hi Jason, I also considered your approach, but that would result in a PR against binmahone/spark-rapids instread of NVIDIA/spark-rapids ? But I have to admit these two PRs are indeed consufing.. I had to do this because of the deadline of code freeze.

"As it is now, this could get approved and merged without addressing all of the comments in #10851. " -- Now that all of the comments are addressed in this PR (#10861), including your latest comment #10851 (comment), let's go with this PR. If you don't have any other comments, please approve it ASAP so that I can merge it in?

@binmahone
Copy link
Collaborator Author

build

1 similar comment
@binmahone
Copy link
Collaborator Author

build

@jlowe
Copy link
Member

jlowe commented May 23, 2024

Please do not force-push changes to a PR. It makes it very difficult for reviewers to track what has changed since they last looked at it. Instead of rebasing, push a merge commit instead.

that would result in a PR against binmahone/spark-rapids instread of NVIDIA/spark-rapids ?

No, there's two potential ways to do this.

  • It looked like there wasn't a lot of dependency between those two changes, and they could potentially be separate PRs. Yes, one will get a merge conflict when the other is merged, but that can be resolved and the PRs reviewed independently.
  • If the dependency coupling is too tight to realistically do a separate PR, you could post a draft PR stating it depends on the other PR. That allows the PR to undergo early CI feedback and review. When the other PR goes in, upmerge to the latest changes on the base branch.

This is a bit of a mess now, because there are review comments on #10851 that are only addressed here, and none of the review comments are here for posterity.

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK, but should not be merged until either #10851 is merged first or @revans2 and @gerashegalov who had review comments in #10851 are happy with this version of it.

@binmahone
Copy link
Collaborator Author

This looks OK, but should not be merged until either #10851 is merged first or @revans2 and @gerashegalov who had review comments in #10851 are happy with this version of it.

The criteria is met (#10851 (comment)). I'm merging this PR now

@binmahone binmahone merged commit c5da29d into NVIDIA:branch-24.06 May 24, 2024
44 checks passed
@binmahone
Copy link
Collaborator Author

This PR also closes #10859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
3 participants