← Back to documentation

Incident Response

Practical incident playbook for detection, triage, containment, communication, and recovery.

9 min read

Use this playbook for high-impact delivery failures or broad degradation.

Purpose

This guide provides a repeatable incident flow:

  • Detect and declare.
  • Scope and triage.
  • Contain impact.
  • Communicate updates.
  • Recover and review.

Prerequisites and permissions

  • Access to Activity, endpoint controls, and target controls.
  • On-call ownership for affected endpoints.

Step-by-step workflow

1. Detect and declare

Trigger incident mode for patterns such as:

  • Sustained DELIVERY_FAILED increase.
  • Widespread auth/validation rejections.
  • Business-critical flow interruption.

Capture incident start time and affected endpoint IDs.

2. Scope impact

  1. Filter Activity to affected endpoints.
  2. Classify blast radius:
  • Single endpoint
  • Single provider/destination
  • Multi-endpoint/systemic
  1. Estimate volume and customer impact.

3. Contain quickly

Prefer smallest safe change first:

  • Disable problematic endpoint.
  • Detach/fix failing target.
  • Roll back recent config changes.

UI cue: endpoint enable toggle and target edit actions are the fastest containment tools.

4. Communicate updates

Share at minimum:

  • What is failing (outcomes/endpoints)
  • Mitigation in progress
  • Next update time

For external help, open Support with timestamps, endpoint IDs, and error details.

5. Recover and verify

  1. Validate fix with controlled traffic.
  2. Confirm success outcomes recover.
  3. Keep elevated monitoring until stable.

6. Post-incident review

Document:

  • Root cause
  • Detection gap
  • Recovery timeline
  • Preventive actions and owners

Expected result and verification checks

  • Impact is contained quickly.
  • Delivery health returns to baseline.
  • Follow-up actions are assigned.

Common issues and fixes

  • Over-scoping: isolate by endpoint first.
  • Early closure: require sustained healthy outcomes.
  • Weak evidence: capture outcome/error snapshots before edits.

Related guides