Global Outage – Azure Virtual Machines

13/10/2021 at 13:21 FINAL UPDATE

Virtual Machines – Mitigated (Tracking ID 0NC_-L9G)

Summary of impact: Between 05:12 UTC and 11:45 UTC on 13 Oct 2021, a subset of customers using Windows Virtual Machines may have received failure notifications when performing service management operations – such as start, create, update, delete. Deployments of new VMs and any updates to extensions may have failed. Non-Windows Virtual Machines, and existing running Windows Virtual Machines should not have been impacted by this issue. Additionally, services with dependencies on Windows VMs may have also experienced similar failures when creating resources.

Preliminary root cause: We identified that calls made during service management operations were failing as a required artifact version data could not be queried. Our investigation focused on the backend compute resource provider (CRP) to determine why the calls were failing, and identified that a required VMGuestAgent could not be queried from the repository.

The VM Guest Agent Extension publishing architecture was being migrated (as part of RDFE migration) to a new platform which leverages the latest Azure Resource Manager (ARM) capabilities

Mitigation: We mitigated impact by marking the appropriate extensions to the correct expected level (in this case, public). Engineers proactively verified the return to full success rate for operations after the updates were completed.

13/10/2021 13:04 UPDATE:

STATUS:
Mitigating 10/13/2021, 2:13:04 PM UTC

SUMMARY OF IMPACT:

Starting as early as 05:12 UTC on 13 Oct 2021, a subset of customers using Windows Virtual Machines may experience failure notifications when performing service management operations – such as start, create, update, delete. Deployments of new VMs and any updates to extensions may fail. Non-Windows Virtual Machines, and existing running Windows Virtual Machines should not be impacted by this issue. Services with dependencies on Windows VMs may also experience similar failures when creating resources.

CURRENT STATUS:

We have identified that calls made during service management operations are failing due to a required artifact version not returning as expected during query. The failure shows that that a required extension cannot be located. We are currently implementing a mitigation option to force a refresh of the extension and are seeing signs of recovery where mitigation has been deployed. We expect recovery to be observed as the mitigation progresses across regions. The next update will be provided within 60 minutes, or as events warrant.

Hello, 

There is currently a global outage regarding Azure Virtual Machines.   

When starting a VM you may receive the following message: 

Standard remediation does not appear to resolve the problem. 

Microsoft have acknowledge an issue on their status page:  

Virtual Machines – Investigating 

Impact Statement: Starting at 07:00 UTC on 13 Oct 2021, a subset of customers using Windows Virtual Machines may experience failure notifications when performing service management operations – such as create, update, delete. Deployments of new VMS and any updates to extensions may fail. Non-Windows Virtual Machines, and existing running Windows Virtual Machines should not be impacted by this issue. 

Current Status: We are aware of this issue and are actively investigating the issue. The next update will be provided within 60 minutes, or as events warrant. 

This message was last updated at 08:35 UTC on 13 October 2021 

About the author