fleet-rlm HTTP and WebSocket API reference

The canonical schema lives in openapi.yaml. This page is a high-level map of the surface.

Category	Prefix	Description
Health	`/`	Unprefixed health and readiness probes
Auth	`/api/v1/auth`	Identity endpoints
Runtime	`/api/v1/runtime`	Settings, diagnostics, volume access
Sessions	`/api/v1/sessions`	History, turns, stats, export, restore
Sandboxes	`/api/v1/sandboxes`	Daytona sandbox management
Runs	`/api/v1/runs`	Execution run steps
Optimization	`/api/v1/optimization`	GEPA optimization, skill optimization, datasets, runs, promotion drafts
Traces	`/api/v1/traces`	MLflow trace feedback
WebSocket	`/api/v1/ws`	Real-time chat and execution streams

Authentication

All /api/v1/* endpoints require authentication when AUTH_REQUIRED=true. Behavior depends on AUTH_MODE:

Mode	Behavior
`dev`	Debug headers, local HS256 tokens, optional identity
`entra`	JWKS-backed Entra ID tokens, Neon tenant admission required

Canonical error envelope

Every error response from /api/v1/* — including FastAPI validation errors and Starlette 404s for unknown routes — uses the same JSON shape:

{
  "code": "not_found",
  "message": "The requested resource was not found.",
  "detail": null
}

Field	Type	Description
`code`	string	Stable machine-readable error code (for example `not_found`, `unauthorized`, `validation_error`, `forbidden`).
`message`	string	Human-readable, non-secret error summary safe to surface in a UI.
`detail`	object \| null	Structured non-secret details when available — for example the per-field issues for validation errors.

Branch on code rather than parsing message, and treat the HTTP status code as the source of truth for severity.

Health endpoints

Unauthenticated probes for load balancers and orchestrators.

`GET /health`

Liveness check.

{
  "status": "live",
  "version": "0.5.40"
}

Field	Values	Description
`status`	`live`	Unambiguous liveness state. Replaces the legacy `ok: true` field.
`version`	string	Package version currently serving the API.

`GET /ready`

Readiness check with component status. Returns 503 with the same ReadyResponse body when a critical runtime dependency is unavailable, so probes can read component state from a failing response.

{
  "ready": true,
  "planner": "ready",
  "database": "ready",
  "database_required": true,
  "sandbox_provider": "daytona"
}

Field	Values	Description
`ready`	boolean	Overall readiness
`planner`	`ready`, `missing`	Planner LM status
`database`	`ready`, `missing`, `disabled`, `degraded`	Database connectivity
`sandbox_provider`	string	Configured sandbox backend

Auth

`GET /api/v1/auth/me`

Returns the authenticated user’s identity envelope.

{
  "tenant_claim": "tenant-123",
  "user_claim": "user-456",
  "email": "user@example.com",
  "name": "Jane Doe",
  "tenant_id": "uuid-...",
  "user_id": "uuid-..."
}

Runtime

Endpoint	Method	Purpose
`/api/v1/runtime/settings`	`GET`	Read runtime settings
`/api/v1/runtime/settings`	`PATCH`	Mutate settings (local-only)
`/api/v1/runtime/tests/daytona`	`POST`	Connectivity probe against Daytona
`/api/v1/runtime/tests/lm`	`POST`	Connectivity probe against the planner LM
`/api/v1/runtime/status`	`GET`	Composite runtime status
`/api/v1/runtime/volume/tree`	`GET`	List durable volume contents
`/api/v1/runtime/volume/file`	`GET`	Read a single durable-volume file
`/api/v1/runtime/llm-profiles`	`GET` / `POST`	List or create provider profiles
`/api/v1/runtime/llm-profiles/{profile_id}`	`PATCH` / `DELETE`	Update or remove a provider profile
`/api/v1/runtime/llm-profiles/{profile_id}/models`	`GET`	Fetch the model catalog for a profile
`/api/v1/runtime/llm-profiles/{profile_id}/test`	`POST`	Run a connectivity probe against a profile
`/api/v1/runtime/llm-profiles/import-env`	`POST`	Create a profile from current `DSPY_*` env values
`/api/v1/runtime/llm-roles`	`GET` / `PATCH`	Read or update planner / delegate / delegate_small role bindings

PATCH /api/v1/runtime/settings, POST/PATCH/DELETE /api/v1/runtime/llm-profiles*, and PATCH /api/v1/runtime/llm-roles are intentionally restricted to APP_ENV=local. In staging and production they are read-only and return 403 forbidden.

LLM provider profiles

Provider profiles store credentials and connection details for an LLM vendor (OpenAI, Anthropic, Google Gemini, or any OpenAI-compatible proxy). Each profile can be bound to one or more runtime roles — planner, delegate, or delegate_small — to select which model serves that role at runtime. Profiles persist in Postgres when DATABASE_URL is configured, or in .fleet/llm-profiles.json for SQLite-only local development. API keys are encrypted at rest and only returned in masked form. Create a profile:

curl -X POST http://localhost:8000/api/v1/runtime/llm-profiles \
  -H "Content-Type: application/json" \
  -d '{
    "name": "OpenAI production",
    "provider_type": "openai",
    "api_key": "sk-..."
  }'

provider_type accepts openai, anthropic, google, or openai_compatible. Gemini profiles use the OpenAI-compatible endpoint at https://generativelanguage.googleapis.com/v1beta/openai/; set api_base on openai_compatible profiles to point at a LiteLLM proxy or self-hosted gateway. List the models a profile exposes (cached; pass ?refresh=true to bypass the cache):

curl http://localhost:8000/api/v1/runtime/llm-profiles/{profile_id}/models

Anthropic catalogs are curated statically because Anthropic does not publish a /v1/models endpoint. Provider fetch failures return the static fallback list with an error string explaining the cause. Bind a profile to a role:

curl -X PATCH http://localhost:8000/api/v1/runtime/llm-roles \
  -H "Content-Type: application/json" \
  -d '{
    "planner": { "profile_id": "uuid-...", "model_id": "gpt-4o" },
    "delegate": { "profile_id": "uuid-...", "model_id": "gpt-4o-mini" }
  }'

The runtime resolves planner, delegate, and delegate_small bindings on every turn, so changes take effect on the next message without a server restart. Import a profile from current environment variables (useful when migrating from a .env-only setup):

curl -X POST http://localhost:8000/api/v1/runtime/llm-profiles/import-env

The response includes the created profile plus any planner / delegate / delegate_small bindings inferred from DSPY_LM_MODEL, DSPY_DELEGATE_LM_MODEL, and DSPY_DELEGATE_LM_SMALL_MODEL.

Volume access boundaries

GET /api/v1/runtime/volume/tree and GET /api/v1/runtime/volume/file enforce explicit security boundaries against the canonical runtime volume roots returned in allowed_roots. Any path outside those roots returns 403 with code: "forbidden". GET /api/v1/runtime/volume/tree accepts:

Parameter	Default	Limit	Description
`max_depth`	`3`	—	Maximum directory depth to traverse.
`max_entries`	`200`	`1000`	Maximum total node entries returned. Prevents oversized listings on large volumes.

The response includes max_depth, max_entries, and entries_returned so clients can detect when limits clipped the listing. GET /api/v1/runtime/volume/file previews include:

Field	Description
`sha256`	SHA-256 hex digest of the full file bytes before truncation.
`encoding`	`utf-8` for clean text, `utf-8-lossy` when replacement characters were introduced, `binary` for non-text files.
`binary`	`true` when the file is binary; `content` is empty in that case.
`truncated`	`true` when the preview was cut off by the provider’s size cap.

Use sha256 to deduplicate or verify cached previews without re-reading the file body.

Sessions

Endpoint	Method	Purpose
`/api/v1/sessions/state`	`GET`	Active session envelope
`/api/v1/sessions`	`GET`	List sessions
`/api/v1/sessions/{id}`	`GET`	Session detail
`/api/v1/sessions/{id}`	`DELETE`	Delete a session
`/api/v1/sessions/{id}/turns`	`GET`	Conversation turns
`/api/v1/sessions/{id}/traces`	`GET`	Paginated MLflow traces linked to a session
`/api/v1/sessions/{id}/export`	`POST`	Export session manifest

GET /api/v1/sessions/{id}/traces returns external traces (MLflow runs, including child delegations) associated with a session. Supports limit (default 50, max 200) and offset query parameters and returns has_more so the Workbench can paginate without re-counting.

Sandboxes

Sandbox detail responses redact environment variables before returning them — the env_vars field on SandboxResponse only contains keys safe for display, not raw secrets.

Optimization

Endpoint	Method	Purpose
`/api/v1/optimization/status`	`GET`	Background runner status
`/api/v1/optimization/run`	`POST`	Trigger an optimization run
`/api/v1/optimization/modules`	`GET`	List registered DSPy modules
`/api/v1/optimization/runs`	`GET` / `POST`	List or create runs
`/api/v1/optimization/runs/{run_id}`	`GET`	Single run detail
`/api/v1/optimization/runs/{run_id}/results`	`GET`	Run results
`/api/v1/optimization/runs/{run_id}/promotion-drafts`	`POST`	Create or load a tenant-scoped promotion draft for a run
`/api/v1/optimization/runs/compare`	`GET`	Baseline vs optimized comparison
`/api/v1/optimization/datasets`	`GET` / `POST`	List or upload datasets
`/api/v1/optimization/datasets/{dataset_id}`	`GET`	Dataset detail

Each entry in GET /api/v1/optimization/modules carries an offline_only flag (default true) indicating that the module can only be optimized through the offline optimization endpoints — not from live request traffic.

Optimization request shape

POST /api/v1/optimization/runs (async) and POST /api/v1/optimization/run (blocking) share the same request schema. Specify exactly one optimization target:

Field	Description
`module_slug`	Registered module slug. Resolves `program_spec` automatically.
`program_spec`	Ad-hoc `module:attr` callable returning a `dspy.Module`.
`skill_name`	Bundled or mounted Fleet skill name. Optimizes the SKILL.md artifact.
`skill_path`	Relative path to a SKILL.md-compatible markdown file.

Common fields:

Field	Default	Description
`dataset_id` / `dataset_path`	—	Registered dataset id, or a path inside `OPTIMIZATION_DATA_ROOT`.
`auto`	`"light"`	Optimization intensity: `light`, `medium`, or `heavy`.
`train_ratio`	`0.8`	Train/validation split ratio.
`output_path`	—	Optional path to save the optimized artifact.
`trace_bundle_paths`	`[]`	Distilled trace bundle paths passed to the RLM-GEPA instruction proposer.
`reflection_profile_id` / `reflection_model_id`	—	Optional LLM profile + model for the proposer/reflection step. Provide together.
`optimizer`	`"gepa"`	Only `gepa` is supported.

# Skill optimization with offline trace context
curl -X POST http://localhost:8000/api/v1/optimization/runs \
  -H "Content-Type: application/json" \
  -d '{
    "skill_name": "optimization",
    "dataset_path": "skill_cases.jsonl",
    "trace_bundle_paths": ["artifacts/traces/optimization-failures.jsonl"],
    "auto": "medium"
  }'

Skill optimization runs the RLM-GEPA instruction proposer inside a Daytona sandbox. The endpoint returns 503 with code service_unavailable when DAYTONA_API_KEY is not configured. Registered module optimization has no Daytona requirement.

Promotion drafts

POST /api/v1/optimization/runs/{run_id}/promotion-drafts creates (or loads, if it already exists) a non-mutating draft promotion record for the targeted run. Drafts are scoped to the caller’s tenant and workspace and serve as the review handoff between an optimization run and an explicit promotion decision. The endpoint does not change the live runtime — it only stages the optimized artifact for review.

Traces

`POST /api/v1/traces/feedback`

Submit thumbs-up / thumbs-down feedback for an MLflow trace. Used by the Workbench feedback UI.

WebSocket

Two WebSocket endpoints power the live UI. Both require the same auth as HTTP endpoints when AUTH_REQUIRED=true.

Endpoint	Stream
`/api/v1/ws/execution`	Chat stream events — user-facing turn-taking
`/api/v1/ws/execution/events`	Execution graph events — tool calls, sandbox steps, recursive delegation

Behind a reverse proxy, ensure HTTP/1.1 upgrade is supported and response buffering is disabled.

Source of truth

When this page disagrees with the code, trust:

src/fleet_rlm/api/main.py — app factory and route mounting.
src/fleet_rlm/api/routers/ — individual routers.
openapi.yaml — generated schema.

​Categories

​Authentication

​Canonical error envelope

​Health endpoints

​GET /health

​GET /ready

​Auth

​GET /api/v1/auth/me

​Runtime

​LLM provider profiles

​Volume access boundaries

​Sessions

​Sandboxes

​Optimization

​Optimization request shape

​Promotion drafts

​Traces

​POST /api/v1/traces/feedback

​WebSocket

​Source of truth

Categories

Authentication

Canonical error envelope

Health endpoints

`GET /health`

`GET /ready`

Auth

`GET /api/v1/auth/me`

Runtime

LLM provider profiles

Volume access boundaries

Sessions

Sandboxes

Optimization

Optimization request shape

Promotion drafts

Traces

`POST /api/v1/traces/feedback`

WebSocket

Source of truth