Skip to content

Using Azure Databricks (Spark) for ML, this is the //build 2019 repository with homework examples, code and notebooks

Notifications You must be signed in to change notification settings

Annielytix/Advanced-Databricks-for-ML-Build-2019

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

build2019-Advanced Azure Databricks for ML

Using Azure Databricks (Spark) for ML, this is the repository prsented at //build 2019 with additional homework examples, code and notebooks

Welcome

Welcome to //build 2019 Advanced Databricks Challenge. We will focus on hands-on activities that develop proficiency in advanced Databricks concepts such as data exploration using Spark, building Supervised & Unsupervised Learning Models, Evaluating Models and using advanced libraries like MMLSpark. These challenges assume an introductory to intermediate knowledge of Azure Databricks, and if this is not the case, please spend time working through the Introduction to Databricks challenges first.

Goals

Most challenges observed by customers in these realms are in stitching multiple services together. As such, where possible, we have tried to place key concepts in the context of a broader example.

At the end of this workshop, you should be able to:

  • Understand how to use Azure Databricks to build ML models including:

    • Supervised Learning (classification)
    • Unsupervised Learning (clustering / recommendation )
  • How to evaluate those models using Azure Databricks

  • Understanding Libraries: Introduction to MMLSpark and when to use it

-Introduction to Deep Learning

Background Knowledge

This workshop is meant for a Data Scientist on Azure who actively scripts using a common data science language like Python. Since this is only a short workshop, there are certain things you will need to read or setup after you arrive.

Firstly, you should have some previous exposure to Python. We will be using it for everything we are building in the workshop, so you should be familiar with how to use it to create ML models. Additionally, this is not a class where we teach you about how to choose the correct algorithm for the business scenario. We assume you have some familiarity with these concepts ahead of time.

Secondly, you should have some experience with Azure Databricks and the core concepts including workspaces, libraries et al. If not, please check out the Intro to Azure Databricks workshop first.

Thirdly, you should have experience with the portal and be able to create resources (and spend money) on Azure. We will not be providing Azure passes for this workshop.

For fun, I have included a EU soccer example (.DBC) as well as a Retail Fashion example and by popular demand, a Pandas UDF Benchmark notebook to help you get started with your User Defined Functions with Pandas. Please let me know if you have any questions.

Challenges

[Business Case I - Azure Databricks

  1. Start by following the steps in the [README] to provision your Azure environment and fork both the [labs] below and the notebooks used in the challenges.
  2. Challenge 0 - Administration. ****Please note: you do not need to run through Admin if you are an attendee of //build(see note below for when to use this Databricks Archive).
  3. Challenge 1 - Exploring Data with Spark.
  4. Challenge 2 - Building Supervised Learning Models.
  5. Challenge 3 - Evaluating Supervised Learning Models.
  6. Challenge 4 - Recommenders and Clustering.
  7. Challenge 5 - Using the MMLSpark Library

Note: The Challenge 0 - Administration archive is to help facilitate this workshop in your offices after the fact.**

Discussion Forum

  • SWAG given for most active participants
  • Q&A and Feedback

About

Using Azure Databricks (Spark) for ML, this is the //build 2019 repository with homework examples, code and notebooks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published