We run three tests at least annually and after major changes:
  1. Accidental data loss (staging): delete a known record → recover via Aurora point‑in‑time restore. Verify data gap ≤ 15 minutes.
  2. Instance/AZ failure: force Aurora DB cluster failover. Service recovers ≤ 30 minutes.
  3. S3 rollback: delete an object → restore prior version (S3 Versioning).
Targets
RPO 15m (within region), RTO 30m (AZ/instance). Region DR: RPO 24h, RTO 4h.