Partial Hypervisor Outage in Ashburn Datacenter

Incident Report for Atlantic.Net

Resolved

At this time our Cloud Operations team has confirmed that the effected Hypervisor compute nodes are reporting that all VMs within them are back online and operational. Our initial review found evidence of potential hardware failure and have started internal discussions and post-mortem review to scope and schedule maintenance to prevent future outage events.

Posted Aug 30, 2023 - 20:12 EDT

Monitoring

Our Remote hands team has implemented the fix as provided by our Cloud Operations and Systems Engineering teams and the Hypervisor nodes that were effected during this outage are now responsive and VMs are coming online now. At this time we are monitoring the issue to ensure all VMs on the effected nodes are online and operational again.

Posted Aug 30, 2023 - 19:53 EDT

Identified

Our Systems Engineering and Cloud Operations teams have identified the root cause of the issue and are working with our remote hands team to resolve the outage.

Posted Aug 30, 2023 - 19:31 EDT

Investigating

We have received monitoring alerts notifying our teams of a potential issue with several Hypervisor compute nodes within our Ashburn datacenter. Our team has escalated this issue to our Cloud Operations and Systems Engineering teams who are investigating further.

Posted Aug 30, 2023 - 18:52 EDT

This incident affected: Regions (Ashburn, Virginia (USA-East-3)) and Services (Cloud Services - Servers).