Skip to content

Making ODBC connection from Databricks (Azure Databricks) to Azure SQL Database with Azure AD User Access Token.

Notifications You must be signed in to change notification settings

easonlai/Databricks_ODBC_Connection_to_Azure_SQL_DB_with_Azure_AD_User_Access_Token

Repository files navigation

Databricks ODBC Connection to Azure SQL Database with Azure AD User Access Token

This code repository is showing how to make ODBC connection from Azure Databricks to Azure SQL Database with Azure AD user access token. Databricks doesn’t supported for Azure AD user passthrough authentication to Azure SQL Database and only supported for Azure Data Lake Storage (ADLS). Due to this limitation, we’re only able to use Azure AD Service Principle (or primitive SQL ID) to make authentication with Azure SQL Database. And we’re also required to setup (Databricks Secret Scope maybe also using Azure Key Vault-backed scope in advanced scenario) with corresponding access control to save the Azure AD Service Principle secrets. This make lots of administration and operation overhead on Databricks Secret Scope. And Azure AD Service Principle based authentication is also hard to trace/audit database access as Secret Scope can be shared.

To make our life easier, I’m trying to make interactive authentication with Azure AD from Databricks notebook and obtain the access token. And then use pyodbc to make token-based authentication with Azure SQL Database.

  1. Analyst use browser to interact Azure Databricks with Notebook.
  2. Initiate interactive authentication (with device code) from Notebook. Open up additional tab from browser (https://microsoft.com/devicelogin) to perform interactive authentication with Azure AD.
  3. After login successful, Notebook user session will get the user access token from Azure AD.
  4. Leverage collected access token to perform token-based authentication with Azure SQL Database via pyodbc library and MS-SQL driver, load SQL query into Panda dataframe, then covert Panda dataframe into Spark dataframe.

diagram1

All data in this code repository is coming from classic dataset of diabetes classification.

Enjoy!