-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve convertToGpu diagnostics and unbloat GpuOverrides #10838
Labels
Comments
gerashegalov
added a commit
that referenced
this issue
May 23, 2024
) Demo PR contributing to #10838 It showcases a coding convention to follow using SortOrder and FilterExec replacements as an example ```scala scala> spark.range(100).where($"id" <= 10).collect() java.lang.RuntimeException: convertToGpu failed at scala.sys.package$.error(package.scala:30) at com.nvidia.spark.rapids.GpuFilterExecMeta.convertToGpu(basicPhysicalOperators.scala:790) at com.nvidia.spark.rapids.GpuFilterExecMeta.convertToGpu(basicPhysicalOperators.scala:783) at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838) at com.nvidia.spark.rapids.GpuOverrides$.com$nvidia$spark$rapids$GpuOverrides$$doConvertPlan(GpuOverrides.scala:4383) at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:4728) at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$3(GpuOverrides.scala:4588) at com.nvidia.spark.rapids.GpuOverrides$.logDuration(GpuOverrides.scala:455) at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$1(GpuOverrides.scala:4585) at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4551) at com.nvidia.spark.rapids.GpuOverrides.applyWithContext(GpuOverrides.scala:4605) at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:4578) at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:4574) at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:532) ``` Signed-off-by: Gera Shegalov <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As of branch-24.06, GpuOverrides.scala is a file of ~4800 Lines of Code (and growing) consisting of numerous replacement rules.
In the center of the rule is an instance of the Function type
which is almost uniformly provided inline compiling down to an anonymous class. Hence, whenever there is an exception in convertToGpu akin to #881
we cannot determine which replacement rule is throwing the exception based on the stack trace alone
without having to look at the source code from which the class was compiled
spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Lines 1216 to 1222 in d2abd35
Note that the original stack trace of #881 does not mention
SortOrder
This issue proposes a convention to have the function instance be nested in a companion object reflecting in its name the Exec, Expression being replaced. This way we can determine the expression, exec causing the issue without matching the line numbers of the stack trace to the right git hash of GpuOverides.
When this function is implemented in a companion object where the corresponding GPU implementation lives, GpuOverrides will become smaller and the rules more compact and clear.
See #10839 for a potential solution.
The text was updated successfully, but these errors were encountered: