Increasing Data Governance capability with Azure Purview

The Problem

The challenge for any organisation today, is how to understand and manage the data their organisation holds.  Two specific challenges are around data discovery and data compliance.

It is increasingly important for organisations to be able to discover what data they hold.  They need to be able to answer questions such as;

  • What data do I have?
  • What data items does the dataset contain?
  • Where did the data originate?
  • Who owns or has prepared the data?
  • Is it a trusted source?

These questions come from users in the organisation wanting to understand what data is available to them, or from the organisation themselves, which is trying to understand the data they hold for compliance reasons.  A Chief Information Officer (CIO) will want answers to questions such as;

  • What is my exposure to risk?
  • What datasets contain Personally Identifiable Information
  • Is my usage compliant?
  • How is licensed data used in my organisation?
  • How do I control access & use?
  • What is the impact of new regulation?

Data mature organisations may have a good understanding of structured data held in their line of business databases, but what about the analysis done with this data, what about unstructured or semi-structured data, or copies of data preserved in spreadsheets.  This forms part of your organisations data footprint too.

What Purview Offers

Azure Purview is Microsoft’s new Software as a Service (SaaS) cloud hosted data governance tool, replacing the previous Azure Data Catalog product.  Azure Purview enables unified data governance in the cloud and is a strategic data governance tool for Microsoft.  It allows you to manage and govern operational, transactional and analytical data in your organisation more effectively.  It is a cloud-native, automated, purpose-built service to address discovery and compliance needs.  It is a fully managed, server-less service that aims to eliminate manual, ad-hoc and homegrown solutions that organisations may have in place.

The aim of Purview is to establish the foundation for effective data governance for your organisation, automating the discovery of data in on-premises, multi-cloud (AWS, Azure, GCP etc) and SaaS sources.  Once data has been discovered, it allows the classification of data at scale to identify sensitivity, compliance, industry, business and company-specific value.  Purview helps organisations know where their data came from and what was derived from it with data lineage functionality.  This helps organisations gain maximise business value from their data by;

  • Connecting business and technical data analysts, data scientists, and data engineers to a trusted data catalog.
  • Enable users to quickly find data and view its lineage and sensitivity.
  • Deliver a curated and consistent glossary of business terms and definitions.
  • Gain Insight into your Data Estate.
  • Understand at a glance how data is being created and used across your data estate.
  • Visually assess the state of data assets, scans, business glossary and sensitive data.

This helps organisations move upwards in their quest to go from data to wisdom, as visualised in the DIKW pyramid.

How does Purview work?

Purview is an Azure hosted SaaS service.  Its interface is Purview Studio, which is accessed using a modern browser.

There are three main components to the tool.

  • Data Map
  • Data Catalog
  • Data Insights

The Data Map allows for automated metadata scanning and lineage identification of hybrid data stores.  Data stores are defined in the Data Map and automated scans are scheduled to catalog the information they hold.  Once the data has been cataloged, Purview has over 100 built-in and custom classifiers that are used to determine the type of data that exists in a data asset.  It also used Microsoft Information Protection sensitivity labels.  The Data Map has Apache Atlas API support, allowing the import, export and query of data held within Purview.

The metadata collected from the scanning builds the Data Catalog.  This allows users to perform semantic search and browse entries, aiding the discovery of data being held by the organisation.  The Data Catalog also comprises a Business Glossary, allowing the organisation to store business definitions of the data items it holds.  This makes the semantic search very powerful as it allows organisational context for data items.  The Data Catalog can also store data lineage with sources, owners, transformations, and life cycle with Azure Synapse workspace integration.

Data Insights gives real-time analysis via assets and scan reports, glossary reports, classification and labeling reports.  It also allows the user to perform asset-level drill down by sensitivity.

My thoughts

Azure Purview is currently in public preview, as and such is in a state of rapid change.  However, what is clear from the preview is the power of the technology, to bring together in one place the metadata about the data an organisation holds, perform classification to understand what that data is, develop a business glossary to explain data items in plain English and giving data insights of the metadata it has collected – all of which is automated.  Microsoft have an ever-expanding list of connectors, adding new sources that can be scanned and classifiers, to identify what the data assets contain.  Never has a CIO or data users in an organisation had this kind of control.

Would you like to know more?

Would you like to know more on how your organisation can take advantage of a modern cloud technology approach to solving your data problems? Contact us on the link below.

About the author