> ## Documentation Index
> Fetch the complete documentation index at: https://docs.qredence.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Deploy the fleet-rlm API server

> Run fleet-rlm in production as a single FastAPI process with Entra ID bearer auth, Neon-backed Postgres persistence, and Daytona sandbox execution.

fleet-rlm is a single FastAPI process that serves both the API and the prebuilt frontend SPA. This guide covers running it outside of a developer laptop.

## Architecture in production

A production deploy needs:

* **fleet-rlm process** — `uv run fleet-rlm serve-api` or `uv run fleet web`.
* **Daytona** — sandbox provider (`DAYTONA_API_KEY`, `DAYTONA_API_URL`).
* **Neon/Postgres** — canonical multi-tenant state (`DATABASE_URL`).
* **Entra ID** — bearer-token auth (`AUTH_MODE=entra`).
* **MLflow** (optional) — tracing backend.

## Environment configuration

```bash .env.production theme={null}
# App
APP_ENV=production
AUTH_MODE=entra
AUTH_REQUIRED=true
DATABASE_REQUIRED=true

# LLM
DSPY_LM_MODEL=openai/gpt-4o
DSPY_LLM_API_KEY=sk-...

# Daytona
DAYTONA_API_KEY=...
DAYTONA_API_URL=https://app.daytona.io/api

# Database
DATABASE_URL=postgresql://USER:PASS@HOST/DB?sslmode=require

# MLflow
MLFLOW_ENABLED=true
MLFLOW_TRACKING_URI=https://mlflow.example.com
```

<Warning>
  Runtime settings writes (`PATCH /api/v1/runtime/settings`) are intentionally limited to `APP_ENV=local`. In staging and production, settings are read-only via the API.
</Warning>

## Authentication

When `AUTH_MODE=entra`, both HTTP and WebSocket access use real Entra bearer-token validation plus Neon-backed tenant admission.

| Mode    | Behavior                                                 |
| ------- | -------------------------------------------------------- |
| `dev`   | Debug headers, local HS256 tokens, optional identity     |
| `entra` | JWKS-backed Entra tokens, Neon tenant admission required |

Token validation in both modes is performed by [`joserfc`](https://jose.authlib.org/). Entra deployments use a built-in JWKS cache with a 5-minute TTL that falls back to the last-known keyset if the JWKS endpoint is temporarily unreachable — short outages of Entra's metadata service do not interrupt authenticated traffic.

See the [Auth Modes reference](https://github.com/Qredence/fleet-rlm/blob/main/docs/reference/auth.md) for the full configuration matrix.

## Run the server

Direct invocation:

```bash theme={null}
uv run fleet-rlm serve-api --host 0.0.0.0 --port 8000
```

Behind a reverse proxy (recommended), terminate TLS at the proxy and forward to fleet-rlm on a private interface. fleet-rlm serves the SPA at `/` and the API under `/api/v1/*` from the same port.

## Health probes

Two unauthenticated endpoints are designed for orchestrators and load balancers:

| Endpoint      | Purpose                                                            |
| ------------- | ------------------------------------------------------------------ |
| `GET /health` | Liveness — returns `{"ok": true, "version": "..."}`                |
| `GET /ready`  | Readiness — reports planner, database, and sandbox provider status |

Sample readiness response:

```json theme={null}
{
  "ready": true,
  "planner_configured": true,
  "planner": "ready",
  "database": "ready",
  "database_required": true,
  "sandbox_provider": "daytona"
}
```

## WebSocket transport

Two streams power the live UI:

* `/api/v1/ws/execution` — chat stream events.
* `/api/v1/ws/execution/events` — execution graph events.

Both require the same auth as HTTP endpoints when `AUTH_REQUIRED=true`. Make sure your reverse proxy is configured to upgrade HTTP/1.1 connections (no buffering).

## Frontend assets

Published installs of `fleet-rlm` include built frontend assets — you do **not** need `pnpm` or a separate build step in production. If you build from source, run `pnpm run build` in `src/frontend/` before launching the server.

## Smoke-test the deploy

Validate Daytona connectivity from the host:

```bash theme={null}
uv run fleet-rlm daytona-smoke \
  --repo https://github.com/Qredence/fleet-rlm.git \
  --ref main
```

Hit the readiness endpoint:

```bash theme={null}
curl https://your-deploy.example.com/ready
```

## See also

* [Configuration reference](/fleet-rlm/reference/configuration) — full env var list.
* [HTTP API reference](/fleet-rlm/reference/http-api) — endpoint surface.