Skip to content

glossary

SLO (Service Level Objective)

A target value for a service level indicator over a period, such as '99.9% of requests succeed over 30 days.' SLOs define how reliable a service should be and drive error budgets, alerting, and engineering priorities.

In depth

A Service Level Objective is an internal reliability target expressed as a goal for an SLI over a time window, for example 99.9% of HTTP requests return successfully within 300 ms over a rolling 30 days. SLOs sit between SLIs, which are the raw measurements, and SLAs, which are external contracts with penalties. Good SLOs are based on what users actually notice rather than what is easy to measure, and they are deliberately set below 100% because perfect reliability is impossible and increasingly expensive to approach. The gap between the SLO and 100% becomes the error budget. Teams use SLOs to drive alerting (page only when the budget is burning fast), prioritization (invest in reliability when objectives are missed), and architecture decisions (a 99.99% target demands far more redundancy than 99.9%). Setting SLOs forces honest conversations about how reliable a service truly needs to be.

Why it matters

Without SLOs, teams either over-invest in reliability nobody needs or discover problems only when customers complain. SLOs align engineering, product, and business on a shared definition of 'reliable enough' and make alerting meaningful by tying pages to user impact instead of CPU graphs.

Real-world example

example.txt

An e-commerce checkout team defines an SLO: 99.95% of checkout API calls succeed in under 500 ms over 28 days. Their dashboards show budget burn in real time, and a fast-burn alert pages on-call only when the failure rate threatens the objective, eliminating noisy 3 a.m. pages for harmless blips.

Tools related to SLO (Service Level Objective)

PrometheusGrafanaDatadogNobl9New Relic

Interview questions

  1. What is the difference between an SLI, SLO, and SLA?
  2. How do you choose an appropriate SLO target for a new service?
  3. Why should SLOs be less than 100%?
  4. How do SLOs change the way you design alerts?
  5. What time window would you pick for an SLO and why?
  6. Describe how you would introduce SLOs to a team that has none.