Observability¶

For AI Agents¶

Prompt: "Set up observability for the rust-oauth2-server with Prometheus metrics, health checks, and OpenTelemetry tracing"

Common observability tasks:

Task	Prompt Example
Check health	"Verify the OAuth2 server is healthy and responding"
View metrics	"Show me the current Prometheus metrics from the server"
Set up monitoring	"Configure Prometheus to scrape metrics from the OAuth2 server"
Enable tracing	"Enable OpenTelemetry tracing and export to Jaeger"
Debug performance	"Check the metrics for slow token endpoint responses"
Monitor rate limits	"Show metrics for rate limit rejections and current usage"
Event health	"Check the health status of the event backend"

Key endpoints: - Health check: GET /health - Basic liveness probe - Readiness check: GET /ready - Storage readiness (for K8s) - Metrics: GET /metrics - Prometheus format metrics - Event health: GET /events/health - Event backend status

What's monitored: - HTTP request volume, latency, and status codes - Token operations (issuance, introspection, revocation) - Actor performance and message processing - Cache hit/miss rates (when Redis cache enabled) - Rate limiting rejections - Circuit breaker states and resilience metrics

The observability surface is intentionally small and code-backed:

liveness: /health
readiness: /ready
metrics: /metrics
eventing health: /events/health
traces: OpenTelemetry export

Endpoints¶

Endpoint	Purpose	Typical use
`/health`	basic process health	load balancer or uptime probe
`/ready`	storage readiness	Kubernetes readiness probe
`/metrics`	Prometheus text endpoint	scrape target
`/events/health`	event backend/plugin health	eventing triage
`/swagger-ui`	generated API surface	validation and operator checks

Example health response:

{
  "status": "healthy",
  "service": "oauth2_server",
  "timestamp": "2026-04-07T12:34:56Z"
}

Metrics¶

The server exposes Prometheus metrics for:

HTTP request volume and latency
token issuance and revocation
actor- and route-level behavior
cache efficiency when cache features are enabled
rate-limit rejections when rate limiting is enabled
resilience middleware state (circuit breaker transitions, back-pressure rejections, bulkhead utilization) when OAUTH2_RESILIENCE_ENABLED=true

Use /metrics directly for raw output, or wire the bundled observability stack.

Local observability stack¶

Bring up the bundled observability services:

docker compose -f docker-compose.observability.yml up -d

Common local endpoints:

Grafana: http://localhost:3000
Prometheus: http://localhost:9090
Jaeger: http://localhost:16686

To export traces from the server locally:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
cargo run

Note

The repo still supports OAUTH2_OTLP_ENDPOINT as a compatibility alias in .env.example, but standard OTEL_* variables are the preferred long-term path.

What to watch first¶

If you are operating this service, start with these checks:

/ready — confirms the storage layer is alive
5xx rate from /metrics
request latency percentiles
event backend health when eventing is enabled
deployment rollouts and recent config changes

Dashboards and SLO assets¶

The repo includes ready-to-edit assets under:

observability/grafana/
observability/prometheus/
observability/otel/
observability/slo/

These are the best place to continue if you want dashboards or alert rules rather than prose.

Benchmarks and performance reports¶

Performance comparisons are generated into:

benchmarks/results/comparison-report.md
benchmarks/results/comparison-data.csv

The benchmark harness guide lives in the repo-local benchmarks/README.md file.

Use the published copy here: benchmarks/README.md.