-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyspark XGBoostSageMakerEstimator fails on .fit() #142
Comments
torsjonas
changed the title
pyspark XGBoostSageMakerEstimator.py does not match scala XGBoostSageMakerEstimator.scala
pyspark XGBoostSageMakerEstimator fails on .fit()
Nov 24, 2021
I am also facing the same issue |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please fill out the form below.
System Information
Describe the problem
Since version 1.4.2 the pyspark XGBoostSageMakerEstimator wrapper class no longer match the corresponding scala class, producing an error in the pyspark JVM communication (during serialization of the python class) when calling pyspark fit function. Specifically, it looks like the property
lamba
was changed tolambda_weights
without a corresponding change in the scala class.https://github.com/aws/sagemaker-spark/pull/135/files#diff-ac899a7e58823fff725d351c8459435bb2f09a9687097cd47d3ec34741eb4156R179
It looks like the 1.4.2 release change also bumps the spark version from 2.2.0 to 2.4.0
I can see a couple of workarounds, downgrading EMR to 5.10.1 which is the latest version that has Spark 2.2.0, but I do not want to do this because EMR 5.10.1 does not have support for Jupyter notebooks (only EMR 5.18.0 has support for Jupyter), and I don't want to run Zeppelin notebooks. Another workaround is to sidestep pyspark completely and just use the scala spark sagemaker integration instead of the pyspark variant.
Minimal repo / logs
This fails with error
Probably, the pyspark communication with Java fails because the pyspark XGBoostSageMakerEstimator class has changed a property previously named
lamba
tolambda_weights
in a recent change, but the scala class was not changed accordingly.Start an EMR 5.23.0 cluster with a cluster bootstrap action to
pip install sagemaker_pyspark
. Attach an EMR Notebook (JupyterLabs pyspark kernel) and execute the following notebook codeThe text was updated successfully, but these errors were encountered: