Pick predictive signals
Alert on replica lag, connection saturation, and failed transactions, not every slow query.
Track storage growth against runway to avoid noisy threshold flips.
Separate maintenance from incidents
Pause non critical alerts during backups and migrations.
Use change windows to annotate DB dashboards and status pages.
Signals that matter
- Replica lag over threshold for 5 minutes
- Connection pool exhaustion
- Disk growth hitting 80% runway
Route smartly
Send DB alerts to a dedicated on call path with runbooks for failover or throttle.
Escalate to product only when user facing endpoints start failing.
