The canonical schema lives in openapi.yaml. This page is a high-level map of the surface.
Categories
| Category | Prefix | Description |
|---|
| Health | / | Unprefixed health and readiness probes |
| Auth | /api/v1/auth | Identity endpoints |
| Runtime | /api/v1/runtime | Settings, diagnostics, volume access |
| Sessions | /api/v1/sessions | History, turns, stats, export, restore |
| Sandboxes | /api/v1/sandboxes | Daytona sandbox management |
| Runs | /api/v1/runs | Execution run steps |
| Optimization | /api/v1/optimization | GEPA optimization, skill optimization, datasets, runs, promotion drafts |
| Traces | /api/v1/traces | MLflow trace feedback |
| WebSocket | /api/v1/ws | Real-time chat and execution streams |
The /api/v1/memory* router was retired in 0.5.40. Memory item browsing is no longer part of the supported HTTP surface.
Authentication
All /api/v1/* endpoints require authentication when AUTH_REQUIRED=true. Behavior depends on AUTH_MODE:
| Mode | Behavior |
|---|
dev | Debug headers, local HS256 tokens, optional identity |
entra | JWKS-backed Entra ID tokens, Neon tenant admission required |
Canonical error envelope
Every error response from /api/v1/* — including FastAPI validation errors and Starlette 404s for unknown routes — uses the same JSON shape:
{
"code": "not_found",
"message": "The requested resource was not found.",
"detail": null
}
| Field | Type | Description |
|---|
code | string | Stable machine-readable error code (for example not_found, unauthorized, validation_error, forbidden). |
message | string | Human-readable, non-secret error summary safe to surface in a UI. |
detail | object | null | Structured non-secret details when available — for example the per-field issues for validation errors. |
Branch on code rather than parsing message, and treat the HTTP status code as the source of truth for severity.
Health endpoints
Unauthenticated probes for load balancers and orchestrators.
GET /health
Liveness check.
{
"status": "live",
"version": "0.5.40"
}
| Field | Values | Description |
|---|
status | live | Unambiguous liveness state. Replaces the legacy ok: true field. |
version | string | Package version currently serving the API. |
GET /ready
Readiness check with component status. Returns 503 with the same ReadyResponse body when a critical runtime dependency is unavailable, so probes can read component state from a failing response.
{
"ready": true,
"planner": "ready",
"database": "ready",
"database_required": true,
"sandbox_provider": "daytona"
}
| Field | Values | Description |
|---|
ready | boolean | Overall readiness |
planner | ready, missing | Planner LM status |
database | ready, missing, disabled, degraded | Database connectivity |
sandbox_provider | string | Configured sandbox backend |
Auth
GET /api/v1/auth/me
Returns the authenticated user’s identity envelope.
{
"tenant_claim": "tenant-123",
"user_claim": "user-456",
"email": "user@example.com",
"name": "Jane Doe",
"tenant_id": "uuid-...",
"user_id": "uuid-..."
}
Runtime
| Endpoint | Method | Purpose |
|---|
/api/v1/runtime/settings | GET | Read runtime settings |
/api/v1/runtime/settings | PATCH | Mutate settings (local-only) |
/api/v1/runtime/tests/daytona | POST | Connectivity probe against Daytona |
/api/v1/runtime/tests/lm | POST | Connectivity probe against the planner LM |
/api/v1/runtime/status | GET | Composite runtime status |
/api/v1/runtime/volume/tree | GET | List durable volume contents |
/api/v1/runtime/volume/file | GET | Read a single durable-volume file |
/api/v1/runtime/llm-profiles | GET / POST | List or create provider profiles |
/api/v1/runtime/llm-profiles/{profile_id} | PATCH / DELETE | Update or remove a provider profile |
/api/v1/runtime/llm-profiles/{profile_id}/models | GET | Fetch the model catalog for a profile |
/api/v1/runtime/llm-profiles/{profile_id}/test | POST | Run a connectivity probe against a profile |
/api/v1/runtime/llm-profiles/import-env | POST | Create a profile from current DSPY_* env values |
/api/v1/runtime/llm-roles | GET / PATCH | Read or update planner / delegate / delegate_small role bindings |
PATCH /api/v1/runtime/settings, POST/PATCH/DELETE /api/v1/runtime/llm-profiles*, and PATCH /api/v1/runtime/llm-roles are intentionally restricted to APP_ENV=local. In staging and production they are read-only and return 403 forbidden.
LLM provider profiles
Provider profiles store credentials and connection details for an LLM vendor (OpenAI, Anthropic, Google Gemini, or any OpenAI-compatible proxy). Each profile can be bound to one or more runtime roles — planner, delegate, or delegate_small — to select which model serves that role at runtime.
Profiles persist in Postgres when DATABASE_URL is configured, or in .fleet/llm-profiles.json for SQLite-only local development. API keys are encrypted at rest and only returned in masked form.
Create a profile:
curl -X POST http://localhost:8000/api/v1/runtime/llm-profiles \
-H "Content-Type: application/json" \
-d '{
"name": "OpenAI production",
"provider_type": "openai",
"api_key": "sk-..."
}'
provider_type accepts openai, anthropic, google, or openai_compatible. Gemini profiles use the OpenAI-compatible endpoint at https://generativelanguage.googleapis.com/v1beta/openai/; set api_base on openai_compatible profiles to point at a LiteLLM proxy or self-hosted gateway.
List the models a profile exposes (cached; pass ?refresh=true to bypass the cache):
curl http://localhost:8000/api/v1/runtime/llm-profiles/{profile_id}/models
Anthropic catalogs are curated statically because Anthropic does not publish a /v1/models endpoint. Provider fetch failures return the static fallback list with an error string explaining the cause.
Bind a profile to a role:
curl -X PATCH http://localhost:8000/api/v1/runtime/llm-roles \
-H "Content-Type: application/json" \
-d '{
"planner": { "profile_id": "uuid-...", "model_id": "gpt-4o" },
"delegate": { "profile_id": "uuid-...", "model_id": "gpt-4o-mini" }
}'
The runtime resolves planner, delegate, and delegate_small bindings on every turn, so changes take effect on the next message without a server restart.
Import a profile from current environment variables (useful when migrating from a .env-only setup):
curl -X POST http://localhost:8000/api/v1/runtime/llm-profiles/import-env
The response includes the created profile plus any planner / delegate / delegate_small bindings inferred from DSPY_LM_MODEL, DSPY_DELEGATE_LM_MODEL, and DSPY_DELEGATE_LM_SMALL_MODEL.
Volume access boundaries
GET /api/v1/runtime/volume/tree and GET /api/v1/runtime/volume/file enforce explicit security boundaries against the canonical runtime volume roots returned in allowed_roots. Any path outside those roots returns 403 with code: "forbidden".
GET /api/v1/runtime/volume/tree accepts:
| Parameter | Default | Limit | Description |
|---|
max_depth | 3 | — | Maximum directory depth to traverse. |
max_entries | 200 | 1000 | Maximum total node entries returned. Prevents oversized listings on large volumes. |
The response includes max_depth, max_entries, and entries_returned so clients can detect when limits clipped the listing.
GET /api/v1/runtime/volume/file previews include:
| Field | Description |
|---|
sha256 | SHA-256 hex digest of the full file bytes before truncation. |
encoding | utf-8 for clean text, utf-8-lossy when replacement characters were introduced, binary for non-text files. |
binary | true when the file is binary; content is empty in that case. |
truncated | true when the preview was cut off by the provider’s size cap. |
Use sha256 to deduplicate or verify cached previews without re-reading the file body.
Sessions
| Endpoint | Method | Purpose |
|---|
/api/v1/sessions/state | GET | Active session envelope |
/api/v1/sessions | GET | List sessions |
/api/v1/sessions/{id} | GET | Session detail |
/api/v1/sessions/{id} | DELETE | Delete a session |
/api/v1/sessions/{id}/turns | GET | Conversation turns |
/api/v1/sessions/{id}/traces | GET | Paginated MLflow traces linked to a session |
/api/v1/sessions/{id}/export | POST | Export session manifest |
GET /api/v1/sessions/{id}/traces returns external traces (MLflow runs, including child delegations) associated with a session. Supports limit (default 50, max 200) and offset query parameters and returns has_more so the Workbench can paginate without re-counting.
Sandboxes
Sandbox detail responses redact environment variables before returning them — the env_vars field on SandboxResponse only contains keys safe for display, not raw secrets.
Optimization
| Endpoint | Method | Purpose |
|---|
/api/v1/optimization/status | GET | Background runner status |
/api/v1/optimization/run | POST | Trigger an optimization run |
/api/v1/optimization/modules | GET | List registered DSPy modules |
/api/v1/optimization/runs | GET / POST | List or create runs |
/api/v1/optimization/runs/{run_id} | GET | Single run detail |
/api/v1/optimization/runs/{run_id}/results | GET | Run results |
/api/v1/optimization/runs/{run_id}/promotion-drafts | POST | Create or load a tenant-scoped promotion draft for a run |
/api/v1/optimization/runs/compare | GET | Baseline vs optimized comparison |
/api/v1/optimization/datasets | GET / POST | List or upload datasets |
/api/v1/optimization/datasets/{dataset_id} | GET | Dataset detail |
Each entry in GET /api/v1/optimization/modules carries an offline_only flag (default true) indicating that the module can only be optimized through the offline optimization endpoints — not from live request traffic.
Optimization request shape
POST /api/v1/optimization/runs (async) and POST /api/v1/optimization/run (blocking) share the same request schema. Specify exactly one optimization target:
| Field | Description |
|---|
module_slug | Registered module slug. Resolves program_spec automatically. |
program_spec | Ad-hoc module:attr callable returning a dspy.Module. |
skill_name | Bundled or mounted Fleet skill name. Optimizes the SKILL.md artifact. |
skill_path | Relative path to a SKILL.md-compatible markdown file. |
Common fields:
| Field | Default | Description |
|---|
dataset_id / dataset_path | — | Registered dataset id, or a path inside OPTIMIZATION_DATA_ROOT. |
auto | "light" | Optimization intensity: light, medium, or heavy. |
train_ratio | 0.8 | Train/validation split ratio. |
output_path | — | Optional path to save the optimized artifact. |
trace_bundle_paths | [] | Distilled trace bundle paths passed to the RLM-GEPA instruction proposer. |
reflection_profile_id / reflection_model_id | — | Optional LLM profile + model for the proposer/reflection step. Provide together. |
optimizer | "gepa" | Only gepa is supported. |
# Skill optimization with offline trace context
curl -X POST http://localhost:8000/api/v1/optimization/runs \
-H "Content-Type: application/json" \
-d '{
"skill_name": "optimization",
"dataset_path": "skill_cases.jsonl",
"trace_bundle_paths": ["artifacts/traces/optimization-failures.jsonl"],
"auto": "medium"
}'
Skill optimization runs the RLM-GEPA instruction proposer inside a Daytona sandbox. The endpoint returns 503 with code service_unavailable when DAYTONA_API_KEY is not configured. Registered module optimization has no Daytona requirement.
POST /api/v1/optimization/runs/{run_id}/promotion-drafts creates (or loads, if it already exists) a non-mutating draft promotion record for the targeted run. Drafts are scoped to the caller’s tenant and workspace and serve as the review handoff between an optimization run and an explicit promotion decision. The endpoint does not change the live runtime — it only stages the optimized artifact for review.
Traces
POST /api/v1/traces/feedback
Submit thumbs-up / thumbs-down feedback for an MLflow trace. Used by the Workbench feedback UI.
WebSocket
Two WebSocket endpoints power the live UI. Both require the same auth as HTTP endpoints when AUTH_REQUIRED=true.
| Endpoint | Stream |
|---|
/api/v1/ws/execution | Chat stream events — user-facing turn-taking |
/api/v1/ws/execution/events | Execution graph events — tool calls, sandbox steps, recursive delegation |
Behind a reverse proxy, ensure HTTP/1.1 upgrade is supported and response buffering is disabled.
Source of truth
When this page disagrees with the code, trust:
src/fleet_rlm/api/main.py — app factory and route mounting.
src/fleet_rlm/api/routers/ — individual routers.
openapi.yaml — generated schema.