Pre-flight every replica
Track replication lag, backup freshness, and promotion eligibility as Watch.Dog checks.
Label replicas by workload fit so you know which node can safely become primary.
Runbook essentials
- Promotion command with validation flags
- Application connection string switch
- Rollback path if promotion stalls
Drill promotion steps
Practice failovers quarterly with read-only traffic first, then write traffic.
Measure RTO and RPO against your uptime SLOs and adjust buffers or capacity.
Verify and communicate
Use synthetics to hit read/write endpoints after promotion and alert on error budgets consumed.
Update status pages and incident channels with timing and next checks.
