Skip to content

aws-samples/sagemaker-end-to-end-workshop

Amazon SageMaker End to End Workshop

This project was designed to provide an end to end experience on Amazon SageMaker.

It has been adapted from an AWS blog post.

Losing customers is costly for any business. Identifying unhappy customers early on gives you a chance to offer them incentives to stay. In this workshop we'll use machine learning (ML) for automated identification of unhappy customers, also known as customer churn prediction.

In this workshop we will use Gradient Boosted Trees (XGBoost) to Predict Mobile Customer Departure.

The Data

Mobile operators have historical records that tell them which customers ended up churning and which continued using the service. We can use this historical information to train an ML model that can predict customer churn. After training the model, we can pass the profile information of an arbitrary customer (the same profile information that we used to train the model) to the model to have the model predict whether this customer will churn.

The dataset we use is publicly available and was mentioned in Discovering Knowledge in Data by Daniel T. Larose. It is attributed by the author to the University of California Irvine Repository of Machine Learning Datasets. The Data sets folder that came with this notebook contains the churn dataset.

The dataset can be downloaded here.

Resources (Workshop Structure)

To put our model in production we will use some features of SageMaker. Workshop is structured as following:

  1. Introduction: Initial setup on Amazon SageMaker Studio environment;
  2. DataPrep: Load churn dataset, tranform it on Amazon SageMaker Data Wrangler, and export it to S3;
  3. Modeling: Create a XGBoost model using Amazon SageMaker Training Jobs and keep track of each training job with Amazon SageMaker Experiments and also debug our model with Amazon SageMaker Debugger;
  4. Evaluation: Check model accuracy with Amazon SageMaker Processing and explainability using Amazon SageMaker Clarify;
  5. Deployment: Host our model on Model hosting and batch inference on Batch Transform;
  6. Monitoring: Monitor our model for concept drift with SageMaker Model Monitor;
  7. Pipelines: Create a Amazon SageMaker Pipelines to run our entire process.

Getting Started

Although we recommend that you follow and run the Labs in order, this workshop was built in a way that you can skip labs or just do those that interest you the most (e.g. you can just run the last Lab, or just run labs 4 an 5, or lab 1 and 4, etc.). Running the labs in order help us understand the natural flow of an ML project and may make more sense.

This is only possible because we leverage the design of SageMaker where each component is independent from each other (e.g. training jobs, hosting, processing) and customers have the freedom to use those that fit better to their use-case.

The 0-Introduction lab is the only Lab that is strictly required to setup some basic things like creating S3 buckets, installing packages, etc.)


Run any module independently

Remember that the 0-Introduction lab is mandatory, no matter which module you will run. Following ones, can be executed independently (just follow the instructions for setup in each lab):

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Questions / Issues?

Please raise an issue on this repo.