Skip to content
This repository has been archived by the owner on Apr 17, 2019. It is now read-only.

This repository contains some sample notebooks illustrating the use of DataRobot and SageMaker

Notifications You must be signed in to change notification settings

datarobot/datarobot-sagemaker-examples

Repository files navigation

What is DataRobot

The DataRobot automated machine learning platform helps data scientists and business analysts discover the best predictive models for every situation, and then deploy them so they can consistently make smarter and faster business decisions that impact their company's bottom line.

Why use DataRobot with SageMaker

DataRobot brings the power of auto-modeling to SageMaker users allowing them to quickly determine and use the best machine learning model for their problem. Within minutes DataRobot can iterate on thousands of combinations of models, data preparation steps and parameters that would take days or weeks to do manually.

Before you start: Pre-requisites

To experience the power of DataRobot+SageMaker you’ll need a DataRobot account. If your company already deployed DataRobot please get an account from your administrator. Otherwise, please contact us here: https://www.datarobot.com/contact-us/

Getting your DataRobot API Endpoint

  1. While logged in the DataRobot interface, click on the profile icon on the top right corner of the screen.

    profile icon

  2. Select Profile from the drop down menu:

    profile link

  3. Your API Token will be in the top section of your profile, copy to insert in your notebooks.

    profile page


Overview of Modeling Example

Statistics on whether a flight was delayed and for how long are available from government databases for all the major carriers. It would be useful to be able to predict before scheduling a flight whether or not it was likely to be delayed. In the example notebooks below, we will use DataRobot to try to model whether a flight will be delayed, based on information such as the scheduled departure time and whether it rained the day of the flight.

List of notebooks

Before beginning with these notebooks, make sure you have read through the Installing Dependencies section of this document.

  1. Basic Introduction walks the user through the basics of using DataRobot from a SageMaker notebook instance. This includes covering topics such as: data preparation, uploading the dataset to DataRobot, kicking off auto-modeling and finally getting predictions from the top ranking model.

  2. Diving Deeper into Modeling shows the user how to explore the models created by the auto-modeling process in more detail. For example, we will see how the models are performing against the training data.

  3. Exploring Reasons for Prediction Results examines enhanced functionality supported by DataRobot to provide more insights into prediction results. For certain project types, DataRobot supports producing explanations about a its prediction output on a per-row basis.

Installing Dependencies

To use these notebooks we require extra dependencies to be installed into the Notebook Instance. While it is possible to install packages directly inside the running Jupyter instance, this is not ideal as everytime the instance is restarted, all modifications will be lost. To better support customization of the Notebook Instance environment Amazon provides Lifecycle Configurations, which is a shell script that can be configured to run each time a notebook instance runs. To learn more, see Amazon's documentation or their blog post on the subject.

Create Notebook Instance

Below we provide some simple steps on how to create a new notebook instance with a lifecycle configuration that will prepare the instance to work with DataRobot. Unfortunately, it SageMaker does not allow you to attach a lifecycle configuration to an existing notebook instance so we will launch a new one:

  1. Click the Create notebook instance button

    create notebook first

  2. Fill in all the appropriate fields until you get to the Lifecycle configuration drop-down and select Create a new lifecycle configuration and this will pop open a new modal panel

    create configuration drop down

  3. Give the configuration a descriptive name (e.g. DataRobot-Standard) and then click on the Start notebook tab under the Scripts section:

    start script

  4. Paste the script below into the script editor:

#!/bin/bash

set -e
export PATH=/home/ec2-user/anaconda3/bin/:$PATH

# Install DataRobot client package for Python 2
conda install -n python2 -c conda-forge datarobot -y -q

# Install DataRobot client package for Python 3
conda install -n python3 -c conda-forge datarobot -y -q
  1. Click the Create configuration button at the bottom of the page to save a new Lifecycle Configuration.

    create configuration

  2. Finish filling in the rest of the options for your instance and click the Create notebook instance button at the bottom of the page to begin launching your new instance.

    create notebook final


In the future, if you need to launch new notebook instances, you can reuse the Lifecycle Configuration created in the steps above rather than creating a new configuration. Note, scripts cannot run for longer than 5 minutes. If a script runs for longer than 5 minutes, it fails and the notebook instance is not created or started.

About

This repository contains some sample notebooks illustrating the use of DataRobot and SageMaker

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published