Deploying the API server

fleet-rlm is a single FastAPI process that serves both the API and the prebuilt frontend SPA. This guide covers running it outside of a developer laptop.

Architecture in production

A production deploy needs:

fleet-rlm process — uv run fleet-rlm serve-api or uv run fleet web.
Daytona — sandbox provider (DAYTONA_API_KEY, DAYTONA_API_URL).
Neon/Postgres — canonical multi-tenant state (DATABASE_URL).
Entra ID — bearer-token auth (AUTH_MODE=entra).
MLflow (optional) — tracing backend.

Environment configuration

.env.production

# App
APP_ENV=production
AUTH_MODE=entra
AUTH_REQUIRED=true
DATABASE_REQUIRED=true

# LLM
DSPY_LM_MODEL=openai/gpt-4o
DSPY_LLM_API_KEY=sk-...

# Daytona
DAYTONA_API_KEY=...
DAYTONA_API_URL=https://app.daytona.io/api

# Database
DATABASE_URL=postgresql://USER:PASS@HOST/DB?sslmode=require

# MLflow
MLFLOW_ENABLED=true
MLFLOW_TRACKING_URI=https://mlflow.example.com

Runtime settings writes (PATCH /api/v1/runtime/settings) are intentionally limited to APP_ENV=local. In staging and production, settings are read-only via the API.

Authentication

When AUTH_MODE=entra, both HTTP and WebSocket access use real Entra bearer-token validation plus Neon-backed tenant admission.

Mode	Behavior
`dev`	Debug headers, local HS256 tokens, optional identity
`entra`	JWKS-backed Entra tokens, Neon tenant admission required

See the Auth Modes reference for the full configuration matrix.

Run the server

Direct invocation:

uv run fleet-rlm serve-api --host 0.0.0.0 --port 8000

Behind a reverse proxy (recommended), terminate TLS at the proxy and forward to fleet-rlm on a private interface. fleet-rlm serves the SPA at / and the API under /api/v1/* from the same port.

Health probes

Two unauthenticated endpoints are designed for orchestrators and load balancers:

Endpoint	Purpose
`GET /health`	Liveness — returns `{"ok": true, "version": "..."}`
`GET /ready`	Readiness — reports planner, database, and sandbox provider status

Sample readiness response:

{
  "ready": true,
  "planner_configured": true,
  "planner": "ready",
  "database": "ready",
  "database_required": true,
  "sandbox_provider": "daytona"
}

WebSocket transport

Two streams power the live UI:

/api/v1/ws/execution — chat stream events.
/api/v1/ws/execution/events — execution graph events.

Both require the same auth as HTTP endpoints when AUTH_REQUIRED=true. Make sure your reverse proxy is configured to upgrade HTTP/1.1 connections (no buffering).

Frontend assets

Published installs of fleet-rlm include built frontend assets — you do not need pnpm or a separate build step in production. If you build from source, run pnpm run build in src/frontend/ before launching the server.

Smoke-test the deploy

Validate Daytona connectivity from the host:

uv run fleet-rlm daytona-smoke \
  --repo https://github.com/Qredence/fleet-rlm.git \
  --ref main

Hit the readiness endpoint:

curl https://your-deploy.example.com/ready

Qredence Plugins

Fleet Pi

Fleet-RLM

Architecture in production

Environment configuration

Authentication

Run the server

Health probes

WebSocket transport

Frontend assets

Smoke-test the deploy

See also

Qredence Plugins

Fleet Pi

Fleet-RLM

Documentation Index

​Architecture in production

​Environment configuration

​Authentication

​Run the server

​Health probes

​WebSocket transport

​Frontend assets

​Smoke-test the deploy

​See also

Architecture in production

Environment configuration

Authentication

Run the server

Health probes

WebSocket transport

Frontend assets

Smoke-test the deploy

See also