Azure Machine Learning Compute Costs Management

Recently I had a project to build a Responsible AI Dashboard for a client. With the Dashboard, this required first building a Machine Learning Algorithm using the Workspace. We had a particular customer trying to solve the problem that they have their data scientists and machine learning operators’ algorithms on fairness and responsibility. Some resolutions to the problem risual came with the Microsoft Responsible AI Dashboard.

This tool empowers businesses, data scientists and machine learning operators with responsible AI tools for model debugging and decision making. Responsible AI ensures that an AI product that has been developed will be used ethicallyfairly, and transparently. It also ensures the well-being of individuals and society.

We can build the machine learning model, job, and dashboard as part of our Machine Learning analysis process through the following steps:

  • Prepare data: Identify the features and label in a dataset. Pre-process, clean and transform, the data.
  • Training the model: Splitting the data into two groups gives us a training and a validation data set. (Train the newly created machine learning model using the training data set. Test the machine learning model using the validation data set.)
  • Evaluation: Comparing the model’s predictions with the known labels.
  • Deployment: After training a machine learning model, deploying the model as an application on a server will allow others to use it.
  • Create a Responsible AI Dashboard using a trained dataset and analyse the dashboard.

Once the model is completed, we create the Online Endpoint. We then deploy a machine learning service using the newly registered model.

Responsible AI Dashboard

Create a dashboard from the Jobs on the left slicer menu.

The computing costs, however, state as follows:

Thus when running should cost $0.35 per hour. However, when jobs were stopped, they were still costing $0.35 per hour. It seems Microsoft has not resolved the costs issue when the Machine Learning compute resources are stopped as opposed to other compute services across the platform. I contacted them and they said the VM is still running even when stopped:

This got me thinking if this appears on my low-cost subscription, how much could this affect a bigger organisation using my higher compute costs for Machine Learning than myself? How much collectively is Microsoft gaining across all users of this resource worldwide.

The emails continued back and forth; Microsoft itself not willing to acknowledge that there is a problem. We then moved forward with calls via teams for them to explain the issue from the side and for me to let them know my expectations as a user.

Microsoft support says that when my VM is running, there are two states, Stopped or Stopped Deallocated. This seems to mean that Stopped is actually not Stopped but still running, thus charging costs to the billing. Stopped deallocated is the actual Stopped service, but this requires further instructions beyond the portal.

The Deallocating commands are as follows:

POST example:{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/virtualMachines/{vmName}/deallocate?api-version=2023-03-01


To avoid further costs with your Azure Computing resources, risual recommend that users must always run the deallocation Request URLs as part of their build process. This will reduce costs and save time from having to delete resources and regenerate them every time they are needed to run. This is particularly useful for the Compute Instances. A Job can be created which runs periodically or on a scheduled basis to find any cluster on your instance and make sure that it is deallocated.

If you are using Azure Machine Learning Compute Clusters, you can pause or stop the instance when you don’t need it. Pausing the instance allows you to keep the allocated resources without incurring additional costs. To do this, go to the Azure portal, find the compute cluster, and choose the appropriate option to pause or stop it. A Job can be created which runs periodically or on a scheduled basis to make sure that your Compute cluster are in a paused state.

About the author