The Importance of Describing Analytical Data

Why is it important to describe analytical data?

Data Analytics is a critical function for business. If your organisation isn’t gaining insight from the data it collects and stores, then you are missing out on the opportunity of optimising your organisation. Your competitors will be doing it, putting your organisation at a disadvantage if it can’t gain insight from the data you have – whether that is understanding what happened, looking at trends to predict what may happen or automating you processing to automatically take action when conditions occur.

Whether you are performing descriptive (what happened?), predictive (what will happen?) or prescriptive (take this action) analytics, data analytics can be complex, processing multiple data sources to perform a single analysis. How do you know you are using the correct data item and how can you understand the provenance of it? The answer is to start collecting metadata on your data.

It is important to describe analytical data to improve the quality and accuracy of your analytics. It also helps to remove data duplication to give you a conformed model of data dimensions across your organisation, so that ERP data can be linked with Sales and HR data, and your individual line of business systems are not just islands of information.

How do we describe data?

What we are attempting to do is to create a Business Glossary of analytical data that is held by the organisation, to allow;

  1. the users of the analytical data to understand the analytics given to them.
  2. the creators of the analytical data to choose the correct sources of the data.
  3. the data governance function in your organisation to review and correct the descriptions.

A good starting point is to start compiling a data dictionary, for the tables used for your analytics. Start with your data warehouse if you have one, else start with the tables in your line of business systems that are used for analytics (no need to do each one as there may be hundreds!).

What information (metadata) should you be collecting? Here is a example of one organisation’s metadata schema for data tables;

  • Data Taxonomy Classification – if you have a data taxonomy, link this here. Is it HR/Sales/Operational data etc?
  • Table Name – The name of the data table
  • Column Name – the name of the data item
  • Source Line of Business System – where has the data come from?
  • Type Dimension/Measure/Fact – what type of data it is?
  • Data Type – how is the data stored?
  • Data Range – What are the range of valid values?
  • NULLs allowed – Are NULL values allowed in the data?
  • Geographic Information – Does the data contain geographic information?
  • Personally Identifiable Information – Does the data contain personally identifiable information?
  • Sensitive – Is the data sensitive?
  • Licensed – Is the data from a licensed dataset?
  • How Derived – If a value is computed, how is the data derived?
  • Short Description – What is the data?
  • Long Description – Longer text description

How you describe the data is up to you and it should reflect your business requirements.

Where should we store this?

Careful consideration should be given to where you store the metadata and how you make it accessible. The correct answer here is dependant on the scale of your organisation. It may be acceptable to store this information in an Excel spreadsheet (many do). A better idea would be to store the metadata in a database table, so that it is inline with the data. For larger organisations, the overhead of maintaining this metadata may be too large, so you may wish to consider governance solutions such as Azure Purview.

The important thing is that the users of the metadata can find and search it easily, improving the quality and efficiency of the analytics. It should be published to a central portal and reviewed regularly for accuracy.

How do we use the metadata?

We use the metadata;

  1. When creating analytical data. To make sure we are using the correct source, to make sure the data is in the correct format and to make sure we are displaying data that is appropriate to the audience – i.e. is sensitive data allowed?
  2. To increase your organisations data governance capability. Bring all the metadata together into a Business Glossary/Data Catalogue. This allows consumers and creators to be able to refer to the catalogue to understand what the data is and if you are using the correct definition.
  3. To pull through the descriptions into your analytics (i.e. use as Tooltips in Power BI).
  4. To use in data quality – measure your data against this. Is everything described? Are data types correct? Is the correct source being used? Is data in range 

Would you like to know more?

Would you like to know more on how your organisation can take advantage of a modern cloud technology approach to solving your data problems? Contact us on the link below.

About the author