Real-time Azure Synapse Analytics with Azure Machine Learning
What is Azure Synapse Analytics?
Azure Synapse brings together the best of SQL technologies used in enterprise data warehousing, Spark technologies used for big data, Pipelines for data integration and ETL/ELT, and deep integration with other Azure services such as Power BI, CosmosDB, and AzureML.
With this service, data scientists can query both relational and non-relational data using the familiar SQL language (using either serverless on-demand queries for data exploration and ad hoc analysis or provisioned resources for your most demanding data warehousing needs. It is also possible now to connect MS Azure Synapse Analytics to more user-friendly ETL and BI data management solutions
What is Azure Machine Learning in Azure Synapse Analytics?
You may be familiar with how a typical data science process looks. It’s a well-known process, which most machine learning projects follow.
The process contains the following steps:
- Data Collection.
- Data analysis.
- Data Preparation.
- Choosing a model.
- Training(Train models on Spark Pools with MLlib).
- Hyperparameter Tuning.
- Evaluation.
- Prediction.
Steps to start with Azure Synapse Analytics:
Step1: Create and setup a Synapse workspace
Step2: Analyse using dedicted SQL pool
Step3:Analyse using Apache Spark
Step4:Analyse using a serverless SQL pool
Step5:Analyse data in a storage account.
Step 6:Orchestrate with pipelines
Step7:Visualize data with Power BI
Before we get started login to Azure-portal
Lets deep dive into the steps
Step1: Create and setup a Synapse workspace:
- Open the Azure portal, and at the top search for Synapse.
- In the search results, under Services, select Azure Synapse Analytics.
- Select Add to create a workspace.
- In the Basics tab, under Project Details, enter your preferred Subscription, Resource group, Region, and then choose a workspace name.
- By Select Data Lake Storage Gen 2, click the button for From subscription.
- By Account name, click Create New and name the new storage account
- By File system name, click Create New and name it users. This will create a storage container called users
- The workspace will use this storage account as the “primary” storage account to Spark tables and Spark application logs.
- Check the “Assign myself the Storage Blob Data Contributor role on the Data Lake Storage Gen2 account” box.
- Select Review + create > Create. Your workspace is ready in a few minutes.
Step2:Analyse using dedicated SQL pool:
- In Synapse Studio, on the left-side pane, select Manage > SQL pools.
- Select New
- For SQL pool name select SQLPOOL1
- For Performance level choose DW100C
- Select Review + create > Create. Your dedicated SQL pool will be ready in a few minutes. Your dedicated SQL pool is associated with a dedicated SQL pool database that’s also called SQLPOOL1.
A dedicated SQL pool consumes billable resources as long as it’s active. You can pause the pool later to reduce costs.
Step3:Analyse using Apache Spark
Step4:Analyse using a serverless SQL pool
- In Synapse Studio, on the left-side pane, select Manage > Apache Spark pools.
- Select New
- For Apache Spark pool name enter Spark1.
- For Node size enter Small.
- For Number of nodes Set the minimum to 3 and the maximum to 3
- Select Review + create > Create. Your Apache Spark pool will be ready in a few seconds.
The Spark pool tells Azure Synapse how many Spark resources to use. You only pay for the resources that you use. When you actively stop using the pool, the resources automatically time out and are recycled.
Step5:Analyse data in a storage account.
Step6:Orchestrate with pipelines
How to Link Azure Machine Learning Workspace to Synapse workspace?
Create Dedicated Serverless Apache Spark and SQL Pools
We will use this pool to store the data we want to enhance with our ML model.
Upload and run the Spark Data Notebook to create Spark Database and SQL Test Database
Once we have our serverless Spark and SQL pools up and running we can now ingest our data setup our Spark and SQL tables for training and testing respectively.
Once the notebook is uploaded, change the sql_pool_name value to match the name of your sql pool and then select the desired spark pool and run click all
Follow the steps in below to train your model.
Enhance SQL Table with Trained Auto ML Model
Once we have the best model we can now evaluate it on our test SQL table using our SQL Pool.
First we need to select the table we want to enhance with the model we just trained
Then we select our new Auto ML model, map the our input table columns to what the model is expecting and choose or create a table for storing our model locally.
This will generate a T-SQL script that evaluate our model against the test data and outputs the predictions.
Step7:Visualize data with Power BI
Create a Power BI workspace
- Sign in to powerbi.microsoft.com.
- Click on Workspaces, then select Create a workspace. Create a new Power BI workspace named MachineLearningmodel or similar, since this name must be unique.
Link your Azure Synapse workspace to your new Power BI workspace
- In Synapse Studio, go to Manage > Linked Services.
- Select New > Connect to Power BI.
- Set Name to MachineLearningmodel .
- Set Workspace name to the Power BI workspace you created above, similar to MachineLearningmodel .Select Create.
Create a Power BI dataset that uses data in your Azure Synapse workspace
- In Synapse Studio, go to Develop > Power BI.
- Go to MachineLearningmodel > Power BI datasets and select New Power BI dataset. Click Start. Select the SQLPOOL1 data source, click Continue.
- Click Download to download the .pbids file for your MachineLearningmodel.pbids file. Click Continue.
- Open the downloaded .pbids file. Power BI Desktop opens and automatically connects to SQLDB1 in your Azure Synapse workspace.
- In the Publish to Power BI window, under Select a destination, choose your MachineLearningmodel, and then click Select.
- Wait for publishing to finish.
happy tinkering!!
References:
https://docs.microsoft.com/en-us/azure/synapse-analytics/
Footnote
Congratulations! You made it till the end! I do hope you enjoyed the ride into Software Engineering for Data Science and Azure!
Connect me here on LinkedIn: Shashanka M