Modern Data Platform: Operation

This is the sixth in a series of blogs which outline a vision of a Modern Data Platform, its components, and the benefits that can be realised from taking a holistic view of your data assets.

In this blog, I will expand further on the Operational components of the Modern Data Platform. It is important to understand that although the components of the Modern Data Platform can be subdivided and thought of separately, they work together with the other components to deliver additional synergies for the organisation.

Data Orchestration

The data orchestration component of the platform controls the physical and logical movement of the data within the Consolidated Data Store. The data orchestration components use data sources, data sets, activities, pipelines and triggers.

The Data Source are the data stores in the Consolidated Data Store. Datasets area specific data asset within the data store. An activity is a specific task or operation that is performed on the data asset. Activities both create and consume datasets and are building blocks for pipelines. An activity executes on a data source. Pipelines are a logical grouping of activities that execute a specific workflow or process. Activities can be reused across multiple pipelines. Pipelines are started by triggers that can either be scheduled, event based or manual.

Data Orchestration may need to be performed by different technologies, depending on the task or underlying data source, but they should be defined and documented in a standard way using the terminology of Data Source, Data Set, Activity, Pipeline and Trigger.

Standard Functions & Processes

Supporting the Consolidated Data Store are a library of standard functions and processes that can be used and reused in data orchestration activities and pipelines. To achieve operational efficiency of the platform and for data to comply with the defined data standards, it is important that common tasks and processes are performed in the same way, and automated wherever possible. For example, ingesting different CSV files into a relational data store should use the same process, calling the same functions wherever possible.

The data platform places no limits on which languages or tools are used to define these functions and processes, but standardisation should be implemented wherever possible. They are documented to an agreed standard and their code placed in source control. They are then registered in the Data Catalogue as an asset, so they can be linked to the assets that use them. This way, when you want to change a function, the metadata exists so that which processes, workflows and data assets are affected by the change.

Workflows & Notifications

The workflow component builds on the standard function and process component by placing them in a more organisational context (business layer). A workflow may contain multiple standard processes, which in turn comprise multiple standard functions. The workflow uses notifications to inform users when events have happened, or when users need to take action in a workflow.

Analytics & Presentation Tooling

The Modern Data Platform vision is not only technology agnostic when it comes to the platform itself, but also to any analytic or data management tooling used to process data on the platform. To enable this, the platform supports a wide variety of connection methods to allow legacy, current and future tooling to connect to its components. Examples include ODBC, ADO.Net, JDBC, REST API, JQuery etc.

Would you like to know more?

Would you like to know more, or how a Modern Data Platform can be applied within your own organisation to bring back control of your data? Contact us on the link below.

About the author