Skip to content

aws-samples/enhanced-pyspark-processor

Background

Currently, local-mode does not work for PySparkProcessor due to YARN not being configured correctly for local setups. To enable local development, we created an enhanced version of the PySparkProcessor which overrides the underlying functionality of the SageMaker SDK, and runs Spark in local mode rather than using YARN. This enhanced version also preserves the interface that exists with the original PySparkProcessor. It's important to note that this project should serve only as a stop-gap solution (until local-mode is natively supported in SageMaker SDK).

Getting Started

To install:

pip install git+https://github.com/aws-samples/enhanced-pyspark-processor

Please refer to the notebook example for usage patterns.

Compatability

The following versions have been tested for compatibility.

SageMaker SDK Spark Compatible?
sagemaker >= 2.22.0, <= 2.61.0 2.4 ✔️
sagemaker >= 2.22.0, <= 2.61.0 3.0 ✔️

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Releases

No releases published

Packages

No packages published

Languages