USSUP-428 Multiple Service Outage
Incident Report for 2sms LLC
Postmortem

Start Date: 10/20/2023 6:00 PM (EST) / 20th October 2023 22:00 (UTC)

Finish Date: 10/23/2023 6:40 AM (EST) / 23rd October 2023 10:40 (UTC)

Description:

Partial to Major service disruption across multiple endpoints.

Impacted Services:

  1. Email2sms SMTP
  2. Email2sms Standard
  3. SNPP
  4. SMPP
  5. MT traffic

Impacted Customers:

  1. All customers of the mentioned services

Cause:

 On Friday, non-HTTP based services begun to fail to accept traffic consistently, this was due to a lack of resources on the host machines upon which the services run collectively. This issue was traced back to the SMSC endpoint consuming high amount of resource as a result of multiple connections spawning. The cause of this has been determined to be a result of a malfunction of a client of 2sms failing to connect to the endpoint and immediately attempting to reconnect. This resulted in a denial of service against the SMSC endpoint. 2sms operates redundant pairs of hosts for services however the issue spread to all available hosts.

 Detection:

 Our outsourced on-call team did not escalate customer raised issues to the appropriate department and we therefore did not receive notification of the ongoing incident.

 Corrective Actions:

 2sms investigated the metrics available to determine the cause of the resource consumption and proceeded to restart affected hosts and isolate the problematic SMSC endpoint so that other services could resume processing. Following that 2sms discovered the high amount of traffic and put in a block of the originating IP address. Once services had been confirmed resolved we engaged with the blocked client to raise awareness of the situation so that they could resolve their malfunctioning software.

Preventative actions:

2sms will be implementing multiple improvements on the SMSC endpoint such as preventing invalid connections, automatically blocking client connections that are excessive and isolating services to prevent cascading issues. We are no longer using an external on-call service as the primary point of contact and we are seeking alternative providers.

Internal audit:

The security incident has been fed into the ISMS and will be part of the review cycle documents for the August 2024 surveillance audit process.

External audit:

The security incident will be reported to the external accredited ISO27001:2013 auditor Certification Europe and will be part of the review cycle for the August 2024 surveillance audit process.

GDPR:

This incident did not compromise PII (Personally Identifiable Information).

Posted Oct 25, 2023 - 10:17 UTC

Resolved
Partial to Major service disruption across multiple endpoints impacting:
-Email2sms SMTP
-Email2sms Standard
-SNPP
-SMPP
-MT traffic
Posted Oct 23, 2023 - 11:45 UTC