Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve convertToGpu diagnostics and unbloat GpuOverrides #10838

Open
gerashegalov opened this issue May 18, 2024 · 0 comments
Open

Improve convertToGpu diagnostics and unbloat GpuOverrides #10838

gerashegalov opened this issue May 18, 2024 · 0 comments
Labels
ease of use Makes the product simpler to use or configure tech debt

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented May 18, 2024

As of branch-24.06, GpuOverrides.scala is a file of ~4800 Lines of Code (and growing) consisting of numerous replacement rules.

In the center of the rule is an instance of the Function type

doWrap: (INPUT, RapidsConf, Option[RapidsMeta[_, _, _]], DataFromReplacementRule)
          => SparkPlanMeta[INPUT]

which is almost uniformly provided inline compiling down to an anonymous class. Hence, whenever there is an exception in convertToGpu akin to #881

we cannot determine which replacement rule is throwing the exception based on the stack trace alone

at com.nvidia.spark.rapids.GpuOverrides$$anon$95.$anonfun$convertToGpu$8(GpuOverrides.scala:1221)

without having to look at the source code from which the class was compiled

expr[SortOrder](
"Sort order",
(a, conf, p, r) => new BaseExprMeta[SortOrder](a, conf, p, r) {
// One of the few expressions that are not replaced with a GPU version
override def convertToGpu(): Expression =
a.withNewChildren(childExprs.map(_.convertToGpu()))
}),

Note that the original stack trace of #881 does not mentionSortOrder

This issue proposes a convention to have the function instance be nested in a companion object reflecting in its name the Exec, Expression being replaced. This way we can determine the expression, exec causing the issue without matching the line numbers of the stack trace to the right git hash of GpuOverides.

When this function is implemented in a companion object where the corresponding GPU implementation lives, GpuOverrides will become smaller and the rules more compact and clear.

See #10839 for a potential solution.

@gerashegalov gerashegalov added the ? - Needs Triage Need team to review and classify label May 18, 2024
@gerashegalov gerashegalov added the ease of use Makes the product simpler to use or configure label May 18, 2024
@mattahrens mattahrens changed the title Improve converToGpu diagnostics and unbloat GpuOverrides Improve convertToGpu diagnostics and unbloat GpuOverrides May 21, 2024
@mattahrens mattahrens added tech debt and removed ? - Needs Triage Need team to review and classify labels May 21, 2024
gerashegalov added a commit that referenced this issue May 23, 2024
)

Demo PR contributing to #10838 

It showcases a coding convention to follow using SortOrder and FilterExec replacements as an example

```scala
scala>  spark.range(100).where($"id" <= 10).collect()

java.lang.RuntimeException: convertToGpu failed
  at scala.sys.package$.error(package.scala:30)
  at com.nvidia.spark.rapids.GpuFilterExecMeta.convertToGpu(basicPhysicalOperators.scala:790)
  at com.nvidia.spark.rapids.GpuFilterExecMeta.convertToGpu(basicPhysicalOperators.scala:783)
  at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
  at com.nvidia.spark.rapids.GpuOverrides$.com$nvidia$spark$rapids$GpuOverrides$$doConvertPlan(GpuOverrides.scala:4383)
  at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:4728)
  at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$3(GpuOverrides.scala:4588)
  at com.nvidia.spark.rapids.GpuOverrides$.logDuration(GpuOverrides.scala:455)
  at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$1(GpuOverrides.scala:4585)
  at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4551)
  at com.nvidia.spark.rapids.GpuOverrides.applyWithContext(GpuOverrides.scala:4605)
  at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:4578)
  at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:4574)
  at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:532)
```

Signed-off-by: Gera Shegalov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ease of use Makes the product simpler to use or configure tech debt
Projects
None yet
Development

No branches or pull requests

2 participants