# SLOs (Service Level Objectives)

This project treats SLOs as **first-class configuration**.

- SLO definitions live in `observability/slo/sloth/` (human-edited)
- Prometheus recording + alerting rules are **generated** into `observability/prometheus/rules/` (machine-generated)

We use [Sloth](https://github.com/slok/sloth) to generate multi-window, multi-burn-rate alerting rules.

## Current SLOs

The initial SLOs focus on the most critical endpoint: `POST /oauth/token`.

- **Token availability** (30d): objective **99.9%**
  - *Error* = HTTP **5xx** responses
- **Token latency (500ms)** (30d): objective **99%**
  - *Error* = requests slower than **0.5s**
  - Excludes `5xx` from the latency SLI so “the server is broken” doesn’t also count as “the server is slow”

The spec is defined in:

- `observability/slo/sloth/oauth2-server.yml`

## How SLO rules are generated

Run:

```bash
./scripts/generate_slo_rules.sh
```

This script:

1. Validates the Sloth spec.
2. Generates Prometheus rules into:

   - `observability/prometheus/rules/oauth2_server_slos.yml`

The generated file is meant to be committed, so Prometheus can load SLO rules without running Sloth continuously.

### Requirements

- Docker (the script runs `ghcr.io/slok/sloth` in a container)

## How Prometheus loads SLO rules

The local Prometheus configuration includes:

```yaml
rule_files:
  - /etc/prometheus/rules/*.yml
```

And `docker-compose.observability.yml` mounts the repo rule directory:

- `./observability/prometheus/rules` → `/etc/prometheus/rules`

So once you regenerate the rules and restart Prometheus, they’ll be picked up.

## Metric labels used by SLOs

SLOs rely on the labeled HTTP metrics:

- `oauth2_server_http_requests_total_by_route{route,method,status}`
- `oauth2_server_http_request_duration_seconds_by_route_bucket{route,method,status,le}`
- `oauth2_server_http_request_duration_seconds_by_route_count{route,method,status}`

The `route` label comes from Actix route patterns (`match_pattern()`), so for the token endpoint it should match:

- `route="/oauth/token"`

## Next steps (easy additions)

Typical follow-up SLO candidates:

- `GET /oauth/authorize` availability
- Introspection/revocation endpoints (if enabled)
- Admin endpoints (if used in production)