Skip to main content
The canonical schema lives in openapi.yaml. This page is a high-level map of the surface.

Categories

CategoryPrefixDescription
Health/Unprefixed health and readiness probes
Auth/api/v1/authIdentity endpoints
Runtime/api/v1/runtimeSettings, diagnostics, volume access
Sessions/api/v1/sessionsHistory, turns, stats, export, restore
Sandboxes/api/v1/sandboxesDaytona sandbox management
Runs/api/v1/runsExecution run steps
Optimization/api/v1/optimizationGEPA optimization, skill optimization, datasets, runs, promotion drafts
Traces/api/v1/tracesMLflow trace feedback
WebSocket/api/v1/wsReal-time chat and execution streams
The /api/v1/memory* router was retired in 0.5.40. Memory item browsing is no longer part of the supported HTTP surface.

Authentication

All /api/v1/* endpoints require authentication when AUTH_REQUIRED=true. Behavior depends on AUTH_MODE:
ModeBehavior
devDebug headers, local HS256 tokens, optional identity
entraJWKS-backed Entra ID tokens, Neon tenant admission required

Canonical error envelope

Every error response from /api/v1/* — including FastAPI validation errors and Starlette 404s for unknown routes — uses the same JSON shape:
{
  "code": "not_found",
  "message": "The requested resource was not found.",
  "detail": null
}
FieldTypeDescription
codestringStable machine-readable error code (for example not_found, unauthorized, validation_error, forbidden).
messagestringHuman-readable, non-secret error summary safe to surface in a UI.
detailobject | nullStructured non-secret details when available — for example the per-field issues for validation errors.
Branch on code rather than parsing message, and treat the HTTP status code as the source of truth for severity.

Health endpoints

Unauthenticated probes for load balancers and orchestrators.

GET /health

Liveness check.
{
  "status": "live",
  "version": "0.5.40"
}
FieldValuesDescription
statusliveUnambiguous liveness state. Replaces the legacy ok: true field.
versionstringPackage version currently serving the API.

GET /ready

Readiness check with component status. Returns 503 with the same ReadyResponse body when a critical runtime dependency is unavailable, so probes can read component state from a failing response.
{
  "ready": true,
  "planner": "ready",
  "database": "ready",
  "database_required": true,
  "sandbox_provider": "daytona"
}
FieldValuesDescription
readybooleanOverall readiness
plannerready, missingPlanner LM status
databaseready, missing, disabled, degradedDatabase connectivity
sandbox_providerstringConfigured sandbox backend

Auth

GET /api/v1/auth/me

Returns the authenticated user’s identity envelope.
{
  "tenant_claim": "tenant-123",
  "user_claim": "user-456",
  "email": "user@example.com",
  "name": "Jane Doe",
  "tenant_id": "uuid-...",
  "user_id": "uuid-..."
}

Runtime

EndpointMethodPurpose
/api/v1/runtime/settingsGETRead runtime settings
/api/v1/runtime/settingsPATCHMutate settings (local-only)
/api/v1/runtime/tests/daytonaPOSTConnectivity probe against Daytona
/api/v1/runtime/tests/lmPOSTConnectivity probe against the planner LM
/api/v1/runtime/statusGETComposite runtime status
/api/v1/runtime/volume/treeGETList durable volume contents
/api/v1/runtime/volume/fileGETRead a single durable-volume file
/api/v1/runtime/llm-profilesGET / POSTList or create provider profiles
/api/v1/runtime/llm-profiles/{profile_id}PATCH / DELETEUpdate or remove a provider profile
/api/v1/runtime/llm-profiles/{profile_id}/modelsGETFetch the model catalog for a profile
/api/v1/runtime/llm-profiles/{profile_id}/testPOSTRun a connectivity probe against a profile
/api/v1/runtime/llm-profiles/import-envPOSTCreate a profile from current DSPY_* env values
/api/v1/runtime/llm-rolesGET / PATCHRead or update planner / delegate / delegate_small role bindings
PATCH /api/v1/runtime/settings, POST/PATCH/DELETE /api/v1/runtime/llm-profiles*, and PATCH /api/v1/runtime/llm-roles are intentionally restricted to APP_ENV=local. In staging and production they are read-only and return 403 forbidden.

LLM provider profiles

Provider profiles store credentials and connection details for an LLM vendor (OpenAI, Anthropic, Google Gemini, or any OpenAI-compatible proxy). Each profile can be bound to one or more runtime roles — planner, delegate, or delegate_small — to select which model serves that role at runtime. Profiles persist in Postgres when DATABASE_URL is configured, or in .fleet/llm-profiles.json for SQLite-only local development. API keys are encrypted at rest and only returned in masked form. Create a profile:
curl -X POST http://localhost:8000/api/v1/runtime/llm-profiles \
  -H "Content-Type: application/json" \
  -d '{
    "name": "OpenAI production",
    "provider_type": "openai",
    "api_key": "sk-..."
  }'
provider_type accepts openai, anthropic, google, or openai_compatible. Gemini profiles use the OpenAI-compatible endpoint at https://generativelanguage.googleapis.com/v1beta/openai/; set api_base on openai_compatible profiles to point at a LiteLLM proxy or self-hosted gateway. List the models a profile exposes (cached; pass ?refresh=true to bypass the cache):
curl http://localhost:8000/api/v1/runtime/llm-profiles/{profile_id}/models
Anthropic catalogs are curated statically because Anthropic does not publish a /v1/models endpoint. Provider fetch failures return the static fallback list with an error string explaining the cause. Bind a profile to a role:
curl -X PATCH http://localhost:8000/api/v1/runtime/llm-roles \
  -H "Content-Type: application/json" \
  -d '{
    "planner": { "profile_id": "uuid-...", "model_id": "gpt-4o" },
    "delegate": { "profile_id": "uuid-...", "model_id": "gpt-4o-mini" }
  }'
The runtime resolves planner, delegate, and delegate_small bindings on every turn, so changes take effect on the next message without a server restart. Import a profile from current environment variables (useful when migrating from a .env-only setup):
curl -X POST http://localhost:8000/api/v1/runtime/llm-profiles/import-env
The response includes the created profile plus any planner / delegate / delegate_small bindings inferred from DSPY_LM_MODEL, DSPY_DELEGATE_LM_MODEL, and DSPY_DELEGATE_LM_SMALL_MODEL.

Volume access boundaries

GET /api/v1/runtime/volume/tree and GET /api/v1/runtime/volume/file enforce explicit security boundaries against the canonical runtime volume roots returned in allowed_roots. Any path outside those roots returns 403 with code: "forbidden". GET /api/v1/runtime/volume/tree accepts:
ParameterDefaultLimitDescription
max_depth3Maximum directory depth to traverse.
max_entries2001000Maximum total node entries returned. Prevents oversized listings on large volumes.
The response includes max_depth, max_entries, and entries_returned so clients can detect when limits clipped the listing. GET /api/v1/runtime/volume/file previews include:
FieldDescription
sha256SHA-256 hex digest of the full file bytes before truncation.
encodingutf-8 for clean text, utf-8-lossy when replacement characters were introduced, binary for non-text files.
binarytrue when the file is binary; content is empty in that case.
truncatedtrue when the preview was cut off by the provider’s size cap.
Use sha256 to deduplicate or verify cached previews without re-reading the file body.

Sessions

EndpointMethodPurpose
/api/v1/sessions/stateGETActive session envelope
/api/v1/sessionsGETList sessions
/api/v1/sessions/{id}GETSession detail
/api/v1/sessions/{id}DELETEDelete a session
/api/v1/sessions/{id}/turnsGETConversation turns
/api/v1/sessions/{id}/tracesGETPaginated MLflow traces linked to a session
/api/v1/sessions/{id}/exportPOSTExport session manifest
GET /api/v1/sessions/{id}/traces returns external traces (MLflow runs, including child delegations) associated with a session. Supports limit (default 50, max 200) and offset query parameters and returns has_more so the Workbench can paginate without re-counting.

Sandboxes

Sandbox detail responses redact environment variables before returning them — the env_vars field on SandboxResponse only contains keys safe for display, not raw secrets.

Optimization

EndpointMethodPurpose
/api/v1/optimization/statusGETBackground runner status
/api/v1/optimization/runPOSTTrigger an optimization run
/api/v1/optimization/modulesGETList registered DSPy modules
/api/v1/optimization/runsGET / POSTList or create runs
/api/v1/optimization/runs/{run_id}GETSingle run detail
/api/v1/optimization/runs/{run_id}/resultsGETRun results
/api/v1/optimization/runs/{run_id}/promotion-draftsPOSTCreate or load a tenant-scoped promotion draft for a run
/api/v1/optimization/runs/compareGETBaseline vs optimized comparison
/api/v1/optimization/datasetsGET / POSTList or upload datasets
/api/v1/optimization/datasets/{dataset_id}GETDataset detail
Each entry in GET /api/v1/optimization/modules carries an offline_only flag (default true) indicating that the module can only be optimized through the offline optimization endpoints — not from live request traffic.

Optimization request shape

POST /api/v1/optimization/runs (async) and POST /api/v1/optimization/run (blocking) share the same request schema. Specify exactly one optimization target:
FieldDescription
module_slugRegistered module slug. Resolves program_spec automatically.
program_specAd-hoc module:attr callable returning a dspy.Module.
skill_nameBundled or mounted Fleet skill name. Optimizes the SKILL.md artifact.
skill_pathRelative path to a SKILL.md-compatible markdown file.
Common fields:
FieldDefaultDescription
dataset_id / dataset_pathRegistered dataset id, or a path inside OPTIMIZATION_DATA_ROOT.
auto"light"Optimization intensity: light, medium, or heavy.
train_ratio0.8Train/validation split ratio.
output_pathOptional path to save the optimized artifact.
trace_bundle_paths[]Distilled trace bundle paths passed to the RLM-GEPA instruction proposer.
reflection_profile_id / reflection_model_idOptional LLM profile + model for the proposer/reflection step. Provide together.
optimizer"gepa"Only gepa is supported.
# Skill optimization with offline trace context
curl -X POST http://localhost:8000/api/v1/optimization/runs \
  -H "Content-Type: application/json" \
  -d '{
    "skill_name": "optimization",
    "dataset_path": "skill_cases.jsonl",
    "trace_bundle_paths": ["artifacts/traces/optimization-failures.jsonl"],
    "auto": "medium"
  }'
Skill optimization runs the RLM-GEPA instruction proposer inside a Daytona sandbox. The endpoint returns 503 with code service_unavailable when DAYTONA_API_KEY is not configured. Registered module optimization has no Daytona requirement.

Promotion drafts

POST /api/v1/optimization/runs/{run_id}/promotion-drafts creates (or loads, if it already exists) a non-mutating draft promotion record for the targeted run. Drafts are scoped to the caller’s tenant and workspace and serve as the review handoff between an optimization run and an explicit promotion decision. The endpoint does not change the live runtime — it only stages the optimized artifact for review.

Traces

POST /api/v1/traces/feedback

Submit thumbs-up / thumbs-down feedback for an MLflow trace. Used by the Workbench feedback UI.

WebSocket

Two WebSocket endpoints power the live UI. Both require the same auth as HTTP endpoints when AUTH_REQUIRED=true.
EndpointStream
/api/v1/ws/executionChat stream events — user-facing turn-taking
/api/v1/ws/execution/eventsExecution graph events — tool calls, sandbox steps, recursive delegation
Behind a reverse proxy, ensure HTTP/1.1 upgrade is supported and response buffering is disabled.

Source of truth

When this page disagrees with the code, trust:
  • src/fleet_rlm/api/main.py — app factory and route mounting.
  • src/fleet_rlm/api/routers/ — individual routers.
  • openapi.yaml — generated schema.