Incident Report
Start Date: 03/07/2022 10:40 AM (CT) / 07 March 2022 16:40 (UTC)
Finish Date: 03/07/2022 12:00 PM (CT) / 07 March 2022 18:00 (UTC)
Description:
Total outage of all services.
Impacted Services:
Impacted Customers:
Cause:
All internal DNS ceased to function; this was the result of a primary domain controller failure.
Detection:
Staff were alerted by internal monitoring systems that there were multiple services failing concurrently. All staff were brought into an incident call to investigate the issue.
Scope of incident:
This affected all customer facing services as well as support, communication, and administrative services.
Corrective Actions:
Individual services were checked however when it was determined that a full outage was occurring, the infrastructure team launched a full-scale investigation. The DNS issues were traced to the primary domain controller acting as the default gateway. Attempts were made to resolve the issue however when this was not possible disaster recovery procedures were executed to transfer over to the secondary failover domain controller.
Preventative actions:
The primary domain controller is completely offline and appears to be corrupt. We are raising a task to build a new domain controller with last know configuration. While we are doing this we will run with the single domain controller with extra alerting and monitoring.