Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Log the shuffle manager class when shuffle manager is misconfigured #10172

Closed
gerashegalov opened this issue Jan 9, 2024 · 2 comments · Fixed by #10871
Closed

[FEA] Log the shuffle manager class when shuffle manager is misconfigured #10172

gerashegalov opened this issue Jan 9, 2024 · 2 comments · Fixed by #10871
Assignees
Labels
ease of use Makes the product simpler to use or configure good first issue Good for newcomers

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented Jan 9, 2024

Is your feature request related to a problem? Please describe.

Instead of just reporting "Cannot initialize ..."

def initShuffleManager(): Unit = {
SparkEnv.get.shuffleManager match {
case rapidsShuffleManager: RapidsShuffleManagerLike =>
rapidsShuffleManager.initialize
case _ =>
throw new IllegalStateException(s"Cannot initialize the RAPIDS Shuffle Manager")
}
}

  1. The code should log the "wrong" shuffle manager instance's class
  2. For the case it is a classloader problem, also output the corresponding classloader of 1
@gerashegalov gerashegalov added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 9, 2024
@mattahrens mattahrens added good first issue Good for newcomers and removed ? - Needs Triage Need team to review and classify labels Jan 16, 2024
@sameerz sameerz added ease of use Makes the product simpler to use or configure and removed feature request New feature or request labels Jan 16, 2024
@zpuller zpuller self-assigned this May 23, 2024
@zpuller
Copy link
Collaborator

zpuller commented May 23, 2024

I am planning to work on this.

@gerashegalov
Copy link
Collaborator Author

For the context regarding the classloader. It is hard to reproduce but we have seen cases triggering

24/01/09 19:57:21 ERROR RapidsExecutorPlugin: Exception in the executor plugin, shutting down!
java.lang.IllegalStateException: Cannot initialize the RAPIDS Shuffle Manager
	at org.apache.spark.sql.rapids.GpuShuffleEnv$.initShuffleManager(GpuShuffleEnv.scala:111)
	at com.nvidia.spark.rapids.RapidsExecutorPlugin.init(Plugin.scala:397)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$executorPlugins$1(PluginContainer.scala:125)
	at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:118)
	at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:105)
	at scala.collection.immutable.ArraySeq.flatMap(ArraySeq.scala:35)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.<init>(PluginContainer.scala:113)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:211)

although the ShuffleManager class was correctly specified. In those cases rapidsShuffleManager is not instance of RapidsShuffleManagerLike. Thus. presumably classOf[RapidsShuffleManagerLike] is different than RapidsShuffleManagerLike reachable via org.apache.spark.SparkEnv.get.shuffleManager.getClass.getSuperclass.getInterfaces

@zpuller zpuller closed this as completed May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ease of use Makes the product simpler to use or configure good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants