Modern Data Platform: Data Catalogue

This is the final part in a series of blogs which outline a vision of a Modern Data Platform, its components, and the benefits that can be realised from taking a holistic view of your data assets.

In this blog, I will expand further on the Data Catalogue component of the Modern Data Platform.  It is important to understand that although the components of the Modern Data Platform can be subdivided and thought of separately, they work together with the other components to deliver additional synergies for the organisation.

Data Catalogue

The purpose of the Data Catalogue is to produce an inventory of data assets within the Modern Data Platform, bringing together all the information about each data asset and to allow easy discovery, and reuse of data within the organisation.

Most of the data assets within the Consolidated Data Store will be automatically registered in the data catalogue, once the data asset has been ingested or created in the one of the physical stores.  For other data assets, a crawler process will automatically find and register data assets, collecting as much metadata about them as possible.  This is done using existing metadata, analytics and classification using machine learning algorithms, populating the metadata store.  In addition, data assets can also be manually registered through the Data Catalogue interface.

The metadata store combines with the security store, audit store and source control to deliver all metadata known about any type of data asset.  This is surfaced to the user using several different technologies, depending on the audience;

  • Data Catalogue Interface
  • Bot & Natural Language Search
  • Exception Reporting
  • Data Owner Dashboard
  • RESTful APIs

The Data Catalogue presents all known data about a data asset, including;

  • Whether it forms part of a larger group of data assets, or Data Asset Group
  • Who owns the data asset
  • Where the data asset is in the data lifecycle
  • A description of the asset, maintained by the data owner
  • Metadata derived from the data asset.  For example, asset type, format, physical store, change history
  • Security for the data asset
  • Audit information for the data asset
  • Any applicable data standards or policies
  • The data lineage for the data asset
  • A preview of the data

Would you like to know more?

Would you like to know more, or how a Modern Data Platform can be applied within your own organisation to bring back control of your data?  Contact us on the link below.

About the author