The Consolidated Data Store
- SQL – a relational data store for traditional table held data
- NoSQL – a datastore for key/value pairs, JSON document-based data and graph data
- Data Lake – an organisation wide, Hadoop based storage platform for big data sets
- Document – an enterprise document management platform for Office documents
Note that the physical data stores within the Consolidated Data Store are not limited to the four in the example. They could also include a cloud service, file shares, Hadoop clusters, cloud storage buckets or other technologies for storing data.
What makes this a consolidated data store, is that they are all managed in the same way. Data stored in each data store will follow the same taxonomy and use a consistent naming convention to provide users with an understanding of where to find and store assets they use. They support the organisations data lifecycle and they are all secured using the same Common Security Model and audited by a Common Audit Model.
The physical stores can be delivered as on-premises services, Cloud Infrastructure as a Service (IaaS), Cloud Platform as a Service (PaaS), Cloud Software as a Service (SaaS) or a mixture of these.
The Consolidated Data Store approach allows metadata to be collected about the data assets stored on it, and its structure allows automated cataloguing of the assets into the Data Catalogue component of the Modern Data Platform.
The Consolidated Data Store in Action
The following graphic depicts how data assets are stored within the different physical stores within the Consolidated Data Store, how they map onto the lifecycle for the development of a data asset, and examples of types of processing using automated standard functions and processes or manual manipulation.
In this example, the lifecycle starts in the Plan stage with a business analyst planning a new dataset. This user would capture details of the audience, formats, data sources and analysis required in a document. This data asset is stored in the document data store for peer review.
The data analyst then extracts the data from the SQL data store into an analytical tool, produces some further analysis by imputing and/or aggregating data and then stores the result in the SQL data store in the Analyse lifecycle stage as a new data asset.
This data asset is then used in a presentation tool for consumption by the end user community and stored as a file in the document store.