Introduction to Azure Synapse Analytics

This is the first in a series of blogs about Microsoft’s key Azure cloud data analytics platform: Azure Synapse Analytics. I was first introduced to this product when it was known as Azure SQL Data Warehouse, which was effectively a stand-alone data warehouse. This evolved into Azure Synapse Analytics which became a unification of big data analytics and enterprise data warehousing.

The Synapse architecture is displayed in the diagram below, which shows at a high level how data flows from the sources on the left-hand side and get stored and transformed within Synapse to finally produce an output for end-user consumption.

Synapse Pools and Pipelines

Synapse currently provides three different compute environments or ‘Pools’ for distinctive workloads which are:

SQL Pool (Dedicated and Serverless Resource Models)
Spark Pool (Big Data Analytics)
Data Explorer Pool (Log and Telemetry Analysis)

ETL/ELT Pipelines get created by developers to pull source data into these pools ready for transformation. Products like Power BI, Machine Learning or Azure Purview then consume the transformed data for end-users. Synapse is underpinned by Azure Data Lake Storage Gen2 that hold files ranging from CSV, JSON or Parquet (plus other file types) that get processed by ETL pipelines, Data Scientists and Data Engineers.

Synapse Studio

Once a Synapse Workspace gets created in the Azure Portal (or via PowerShell), it’s possible to access the above features in Synapse Studio, which is a unified portal that allows Administrators, Data Engineers, Developers, Data Scientists and Report Analysts to access the relevant Pools and easily read or manipulate data.

The main benefit of Synapse Studio is Azure components get consolidated in one place as opposed to being dotted around your environment. This means resources are easier find and reduces security configuration complexity. The diagram below shows the ‘hubs’ used in Synapse Studio.

Home: The Home Page displays resources like recently used Power BI Reports or Synapse Pipelines that can be quickly accessed. It has direct links to explore, ingest or visualise data with a handy knowledge centre to access sample scripts and datasets.

Data: My favourite hub! It holds SQL and Lake databases as well as Data Lake Storage Containers and Integration Datasets used to reference Synapse Pipeline data.

Develop: This hub contains SQL Scripts, Notebooks, Data Flows and Spark Job definitions. If you have connected your Power BI Workspace with Synapse, this is where to find your Power BI Reports.

Integrate: Data Engineers frequently use this hub to create and configure ETL pipelines for data ingestion and transformation purposes.

Monitor: Broken pipeline? This hub displays pipeline run results and allows digging deeper into job failure logs (or successful runs of course!).

Manage: Administrators grant access controls and credentials for the Synapse Workspace in this hub. It also allows management of the Pools, Linked Services and Integration Runtimes.

Synapse Benefits

So, the burning question for clients usually is, “How can implementing Azure Synapse Analytics benefit my business?” Well, without getting too technical (that’s for later posts!) here are some answers:

It’s quick. Very quick! The Dedicated SQL Pool uses MPP (Massive Parallel Processing) architecture that transfers data processing operations to multiple compute loads in parallel. I’ve witnessed huge ETL loads complete 10 times faster using Synapse compared to traditional On-Premises SSIS ETL.
Native Power BI Integration: Synapse connects to Power BI Workspaces, which means viewing, modifying and building reports directly is possible from the Synapse Workspace. As reports are in one place, it makes it even easier to share reports with various stakeholders across the business.
Source Control and Purview Linkage: It’s possible to link Synapse to Azure DevOps for source control purposes and Azure Purview for data governance and asset management. This means if anything breaks, then source control comes to the rescue for rollback purposes, and any alterations to your assets and environment get tracked over time by Purview.
Security: Synapse is by the far the most secure Data Warehousing platform I’ve encountered. Security is easily managed via the Manage hub by assigning RBAC roles to individual users and groups, or creating credentials containing user-assigned or system-assigned managed identities and service principals. To ensure network isolation, managed workspace virtual networks can get implemented without creating separate vnets and network security rules (this really simplifies networking requirements!).
Integration Pipelines: Synapse Integration Pipelines are built using the same underlying framework as Azure Data Factory, so will be very familiar to Data Engineers. I’ve pulled data using source connectors like On-Premise SQL Server, Oracle, Dynamics CRM, HTTP, REST API and various files like CSV, JSON or Parquet. Creating UI Data Flows is possible in Synapse or if you’re old school like me, you can call database SQL stored procedures to run the ETL instead!
Machine Learning: Azure Synapse is integrated with Azure Machine Learning, so for all you budding Data Scientists out there, it’s possible to create Machine Learning models using the Spark Pool to train your models for accurate data predictions and forecasting.
High Availability: For all the DBAs worrying about losing data, Synapse comes with High-Availability and Disaster Recovery solutions to ensure data is never lost in case of human error or unplanned events like natural disaster or hardware failure. This is essentially achieved using Geo-Backups and using automatic or manually created restore points.

In the next blog I’ll discuss the SQL, Spark and Data Explorer Pool in more detail, but until then I hope this introduction gives you a feel for Synapse and how it can benefit your business.

Would you like to know more?

risual are currently building Azure Synapse Analytics data platforms for multiple clients right now! Contact us on the link below if your organisation wants to take advantage of a modern cloud technology approach to solving data problems.

About the author

DavidM