fleet-rlm is a single FastAPI process that serves both the API and the prebuilt frontend SPA. This guide covers running it outside of a developer laptop.
Architecture in production
A production deploy needs:
- fleet-rlm process —
uv run fleet-rlm serve-api or uv run fleet web.
- Daytona — sandbox provider (
DAYTONA_API_KEY, DAYTONA_API_URL).
- Neon/Postgres — canonical multi-tenant state (
DATABASE_URL).
- Entra ID — bearer-token auth (
AUTH_MODE=entra).
- MLflow (optional) — tracing backend.
Environment configuration
# App
APP_ENV=production
AUTH_MODE=entra
AUTH_REQUIRED=true
DATABASE_REQUIRED=true
# LLM
DSPY_LM_MODEL=openai/gpt-4o
DSPY_LLM_API_KEY=sk-...
# Daytona
DAYTONA_API_KEY=...
DAYTONA_API_URL=https://app.daytona.io/api
# Database
DATABASE_URL=postgresql://USER:PASS@HOST/DB?sslmode=require
# MLflow
MLFLOW_ENABLED=true
MLFLOW_TRACKING_URI=https://mlflow.example.com
Runtime settings writes (PATCH /api/v1/runtime/settings) are intentionally limited to APP_ENV=local. In staging and production, settings are read-only via the API.
Authentication
When AUTH_MODE=entra, both HTTP and WebSocket access use real Entra bearer-token validation plus Neon-backed tenant admission.
| Mode | Behavior |
|---|
dev | Debug headers, local HS256 tokens, optional identity |
entra | JWKS-backed Entra tokens, Neon tenant admission required |
Token validation in both modes is performed by joserfc. Entra deployments use a built-in JWKS cache with a 5-minute TTL that falls back to the last-known keyset if the JWKS endpoint is temporarily unreachable — short outages of Entra’s metadata service do not interrupt authenticated traffic.
See the Auth Modes reference for the full configuration matrix.
Run the server
Direct invocation:
uv run fleet-rlm serve-api --host 0.0.0.0 --port 8000
Behind a reverse proxy (recommended), terminate TLS at the proxy and forward to fleet-rlm on a private interface. fleet-rlm serves the SPA at / and the API under /api/v1/* from the same port.
Health probes
Two unauthenticated endpoints are designed for orchestrators and load balancers:
| Endpoint | Purpose |
|---|
GET /health | Liveness — returns {"ok": true, "version": "..."} |
GET /ready | Readiness — reports planner, database, and sandbox provider status |
Sample readiness response:
{
"ready": true,
"planner_configured": true,
"planner": "ready",
"database": "ready",
"database_required": true,
"sandbox_provider": "daytona"
}
WebSocket transport
Two streams power the live UI:
/api/v1/ws/execution — chat stream events.
/api/v1/ws/execution/events — execution graph events.
Both require the same auth as HTTP endpoints when AUTH_REQUIRED=true. Make sure your reverse proxy is configured to upgrade HTTP/1.1 connections (no buffering).
Frontend assets
Published installs of fleet-rlm include built frontend assets — you do not need pnpm or a separate build step in production. If you build from source, run pnpm run build in src/frontend/ before launching the server.
Smoke-test the deploy
Validate Daytona connectivity from the host:
uv run fleet-rlm daytona-smoke \
--repo https://github.com/Qredence/fleet-rlm.git \
--ref main
Hit the readiness endpoint:
curl https://your-deploy.example.com/ready
See also