Start with user journeys
Define SLIs that describe availability from the user point of view, not the host metrics.
Tie each SLI to a single status page component and owner.
Common SLIs
- HTTP 200 rate over last 30 days
- p95 latency under 500ms
- Background job success rate above 99%
Draw the contract lines
Use SLOs to run the team and SLAs to set refunds; keep SLA numbers looser than SLOs.
Show both on the same dashboard to avoid misalignment.
When SLO burn hits 50% mid period, lock deploys and review runbooks.
Review with stakeholders
Publish monthly reports that show uptime, SLO attainment, and incidents in one shareable URL.
Keep a FAQ that clarifies maintenance windows and excluded events.
