Machine Learning: Managing predicted data

As organisations start to look at creating datasets containing predicted data, using technologies like Azure Machine Learning, we need to think carefully about how we store this type of data.  Predicted data is a guess or estimation and so is nearly always wrong, especially if using a regression analysis.  The problem with storing predicted values, is that the user of the dataset needs to understand that that piece of data was predicted, and what was used to predict the value.  Was it a human or machine generated prediction?  If we don’t label this type of data clearly, then it could be confused with reported data, and reports, dashboards or further analytics built using it may be misleading.  This problem is compounded if your data governance capability is immature.

This problem should be tackled at multiple levels within the organisation.  One approach could be, at the lowest level, the data should be stored in the data table with additional metadata.  In the example below, the predicted value in our data table is [SalesForecast].  When browsing this data, it is not apparent if this is predicted, when it was predicted, what was used to predict it and a confidence level in the value.

Something better would be to clearly mark the data as predicted and to store some additional metadata about the predicted value, in either the same table or in a more normalised form.

To complement this, the data dictionary/catalogue for the table should be maintained to include a description of the data item, how it was predicted, etc, so that analysts looking at the data set can understand how it was created.

Finally, the approach for dealing with predicted data should be described by a data handling policy, reviewed regularly and data checked for compliance.  A high-level description could also be included in your data strategy.

As AI and machine learning technologies are applied to more and more use cases, this approach helps the organisation understand their data better and produce more meaningful output.

Would you like to know more?

Would you like to know more on how your organisation can take advantage of a modern cloud technology approach to solving your data problems?  Contact us on the link below.

https://www.risual.com/contact-us

About the author