Sharing cloud hosted data with Azure Data Share

What is Azure Data Share?

Azure Data Share is a new cloud service from Microsoft. Hosted on their Azure public cloud, this service plans to reinvent data sharing of cloud hosted data. It does this by simplifying the data sharing process whilst giving better control and governance, enabling organisations to share their data more effectively.
Personally, when I am sharing data, there are several requirements that a solution must deliver;
  1. I want to know who I shared the data with
  2. I want to know exactly what has been shared
  3. I want to spend the minimum amount of time doing it
  4. I want to avoid building infrastructure which adds cost
  5. I want it to only go to the intended recipient
  6. I want the data to be transferred in a secure manner
  7. I want to communicate with the recipient any limitations on data use
  8. I want to be able to revoke access
  9. I want the recipient to get updates to the data without me having to do any further work
  10.  I want to be sure I am sharing the correct data in line with my organisations policy
Azure Data Share is a Platform as a Service (PaaS) component, and as such, there is no infrastructure to deploy and no code to write, so you can be up and running very quickly.  It is a “share at source” technology, so no copies of data are taken internally for it to be published.  This avoids time and space storing copies of data that you then must manage.  This is especially useful if you are sharing large data sets.
Use cases for this service include inter organisation sharing, external sharing with suppliers/customers, and data marketplaces.  A REST API is also available to allow you to build custom data sharing applications using this technology.  Azure Data Share is currently in Public Preview.

How does it work?

Azure Data Share is managed through the Azure Portal web interface.  It currently allows you to share data sets hosted in Azure BLOB storage and in Azure Data Lake Store (Gen1 and Gen2).  More supported data providers will be added in the future.
Once you have identified the data you wish to share, you then create a data share for the data and then add recipients (internal or external to your organisation) via an email invitation.  Azure Data Share allows you to specify a name for the data share, a description and terms of use for the data share, which the recipient must agree to before they can access the data.  There is no size limitation on data shared with Azure Data Share.  The sharer can also decide whether a recipient can have a point in time copy, or whether they can be allowed to have updates of the data set.
Once the recipient accepts the invitation and terms of use, the data is then transferred to a target store – either Azure BLOB storage or Azure Data Lake Store, in the recipient’s Azure subscription.  This transfer is completed by a cloud to cloud copy.  The recipient also has the choice to subscribe to updates of the data (if allowed by the sharer) which is provided by snapshots, on a frequency decided by the sharer (Hourly/Daily).
The whole process is managed by the Azure Data Share service, which gives the sharer and recipient a single pane of glass for all data sharing done through the service.  It allows the sharer to see what data has been shared, when it was shared, who it has been shared with, the status of all invites and whether updates to the data are subscribed too.  At any point in time the sharer can revoke access to the data updates.
(Figure above reproduced from Microsoft documentation)

What are the benefits?

Some of the benefits provided by Azure Data Share include;
  1. You know who each data set has been shared with
  2. You know exactly what has been shared
  3. Simple management.  Share data in minutes
  4. No infrastructure required.  No setting up SFTP servers, no emailing uncontrolled data
  5. No open access.  You can only access the data if you have been invited
  6. The data is transferred using encrypted data snaphots that never leave Azure
  7. Increase data governance by specifying terms of use of the data
  8. Access can be revoked at any time
  9. Provide updates to the data without any additional work
  10. Simple single pane of glass web interface to manage sharing

How much does it cost?

While in preview, Azure Data Share only charges for dataset movement compute. Dataset movement compute are the resources required to move a dataset from the source to the destination. Dataset movement compute charges are prorated by the minute and rounded up.  Currently this is £0.373 per vCore/Hour.

Would you like to know more?

Would you like to know more on how your organisation can take advantage of a modern cloud technology approach to solving your data problems?  Contact us on the link below.

About the author