> ## Documentation Index
> Fetch the complete documentation index at: https://docs.qredence.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# fleet-rlm HTTP and WebSocket API reference

> Reference for the fleet-rlm FastAPI surface: health probes, auth, runtime, sessions, optimization, promotion drafts, traces, and WebSocket execution streams.

The canonical schema lives in [`openapi.yaml`](https://github.com/Qredence/fleet-rlm/blob/main/openapi.yaml). This page is a high-level map of the surface.

## Categories

| Category     | Prefix                 | Description                                                             |
| ------------ | ---------------------- | ----------------------------------------------------------------------- |
| Health       | `/`                    | Unprefixed health and readiness probes                                  |
| Auth         | `/api/v1/auth`         | Identity endpoints                                                      |
| Runtime      | `/api/v1/runtime`      | Settings, diagnostics, volume access                                    |
| Sessions     | `/api/v1/sessions`     | History, turns, stats, export, restore                                  |
| Sandboxes    | `/api/v1/sandboxes`    | Daytona sandbox management                                              |
| Runs         | `/api/v1/runs`         | Execution run steps                                                     |
| Optimization | `/api/v1/optimization` | GEPA optimization, skill optimization, datasets, runs, promotion drafts |
| Traces       | `/api/v1/traces`       | MLflow trace feedback                                                   |
| WebSocket    | `/api/v1/ws`           | Real-time chat and execution streams                                    |

<Note>
  The `/api/v1/memory*` router was retired in 0.5.40. Memory item browsing is no longer part of the supported HTTP surface.
</Note>

## Authentication

All `/api/v1/*` endpoints require authentication when `AUTH_REQUIRED=true`. Behavior depends on `AUTH_MODE`:

| Mode    | Behavior                                                    |
| ------- | ----------------------------------------------------------- |
| `dev`   | Debug headers, local HS256 tokens, optional identity        |
| `entra` | JWKS-backed Entra ID tokens, Neon tenant admission required |

## Canonical error envelope

Every error response from `/api/v1/*` — including FastAPI validation errors and Starlette 404s for unknown routes — uses the same JSON shape:

```json theme={null}
{
  "code": "not_found",
  "message": "The requested resource was not found.",
  "detail": null
}
```

| Field     | Type           | Description                                                                                                    |
| --------- | -------------- | -------------------------------------------------------------------------------------------------------------- |
| `code`    | string         | Stable machine-readable error code (for example `not_found`, `unauthorized`, `validation_error`, `forbidden`). |
| `message` | string         | Human-readable, non-secret error summary safe to surface in a UI.                                              |
| `detail`  | object \| null | Structured non-secret details when available — for example the per-field issues for validation errors.         |

Branch on `code` rather than parsing `message`, and treat the HTTP status code as the source of truth for severity.

## Health endpoints

Unauthenticated probes for load balancers and orchestrators.

### `GET /health`

Liveness check.

```json theme={null}
{
  "status": "live",
  "version": "0.5.40"
}
```

| Field     | Values | Description                                                       |
| --------- | ------ | ----------------------------------------------------------------- |
| `status`  | `live` | Unambiguous liveness state. Replaces the legacy `ok: true` field. |
| `version` | string | Package version currently serving the API.                        |

### `GET /ready`

Readiness check with component status. Returns `503` with the same `ReadyResponse` body when a critical runtime dependency is unavailable, so probes can read component state from a failing response.

```json theme={null}
{
  "ready": true,
  "planner": "ready",
  "database": "ready",
  "database_required": true,
  "sandbox_provider": "daytona"
}
```

| Field              | Values                                     | Description                |
| ------------------ | ------------------------------------------ | -------------------------- |
| `ready`            | boolean                                    | Overall readiness          |
| `planner`          | `ready`, `missing`                         | Planner LM status          |
| `database`         | `ready`, `missing`, `disabled`, `degraded` | Database connectivity      |
| `sandbox_provider` | string                                     | Configured sandbox backend |

## Auth

### `GET /api/v1/auth/me`

Returns the authenticated user's identity envelope.

```json theme={null}
{
  "tenant_claim": "tenant-123",
  "user_claim": "user-456",
  "email": "user@example.com",
  "name": "Jane Doe",
  "tenant_id": "uuid-...",
  "user_id": "uuid-..."
}
```

## Runtime

| Endpoint                                           | Method             | Purpose                                                           |
| -------------------------------------------------- | ------------------ | ----------------------------------------------------------------- |
| `/api/v1/runtime/settings`                         | `GET`              | Read runtime settings                                             |
| `/api/v1/runtime/settings`                         | `PATCH`            | Mutate settings (local-only)                                      |
| `/api/v1/runtime/tests/daytona`                    | `POST`             | Connectivity probe against Daytona                                |
| `/api/v1/runtime/tests/lm`                         | `POST`             | Connectivity probe against the planner LM                         |
| `/api/v1/runtime/status`                           | `GET`              | Composite runtime status                                          |
| `/api/v1/runtime/volume/tree`                      | `GET`              | List durable volume contents                                      |
| `/api/v1/runtime/volume/file`                      | `GET`              | Read a single durable-volume file                                 |
| `/api/v1/runtime/llm-profiles`                     | `GET` / `POST`     | List or create provider profiles                                  |
| `/api/v1/runtime/llm-profiles/{profile_id}`        | `PATCH` / `DELETE` | Update or remove a provider profile                               |
| `/api/v1/runtime/llm-profiles/{profile_id}/models` | `GET`              | Fetch the model catalog for a profile                             |
| `/api/v1/runtime/llm-profiles/{profile_id}/test`   | `POST`             | Run a connectivity probe against a profile                        |
| `/api/v1/runtime/llm-profiles/import-env`          | `POST`             | Create a profile from current `DSPY_*` env values                 |
| `/api/v1/runtime/llm-roles`                        | `GET` / `PATCH`    | Read or update planner / delegate / delegate\_small role bindings |

<Warning>
  `PATCH /api/v1/runtime/settings`, `POST/PATCH/DELETE /api/v1/runtime/llm-profiles*`, and `PATCH /api/v1/runtime/llm-roles` are intentionally restricted to `APP_ENV=local`. In staging and production they are read-only and return `403 forbidden`.
</Warning>

### LLM provider profiles

Provider profiles store credentials and connection details for an LLM vendor (OpenAI, Anthropic, Google Gemini, or any OpenAI-compatible proxy). Each profile can be bound to one or more runtime roles — `planner`, `delegate`, or `delegate_small` — to select which model serves that role at runtime.

Profiles persist in Postgres when `DATABASE_URL` is configured, or in `.fleet/llm-profiles.json` for SQLite-only local development. API keys are encrypted at rest and only returned in masked form.

Create a profile:

```bash theme={null}
curl -X POST http://localhost:8000/api/v1/runtime/llm-profiles \
  -H "Content-Type: application/json" \
  -d '{
    "name": "OpenAI production",
    "provider_type": "openai",
    "api_key": "sk-..."
  }'
```

`provider_type` accepts `openai`, `anthropic`, `google`, or `openai_compatible`. Gemini profiles use the OpenAI-compatible endpoint at `https://generativelanguage.googleapis.com/v1beta/openai/`; set `api_base` on `openai_compatible` profiles to point at a LiteLLM proxy or self-hosted gateway.

List the models a profile exposes (cached; pass `?refresh=true` to bypass the cache):

```bash theme={null}
curl http://localhost:8000/api/v1/runtime/llm-profiles/{profile_id}/models
```

Anthropic catalogs are curated statically because Anthropic does not publish a `/v1/models` endpoint. Provider fetch failures return the static fallback list with an `error` string explaining the cause.

Bind a profile to a role:

```bash theme={null}
curl -X PATCH http://localhost:8000/api/v1/runtime/llm-roles \
  -H "Content-Type: application/json" \
  -d '{
    "planner": { "profile_id": "uuid-...", "model_id": "gpt-4o" },
    "delegate": { "profile_id": "uuid-...", "model_id": "gpt-4o-mini" }
  }'
```

The runtime resolves planner, delegate, and delegate\_small bindings on every turn, so changes take effect on the next message without a server restart.

Import a profile from current environment variables (useful when migrating from a `.env`-only setup):

```bash theme={null}
curl -X POST http://localhost:8000/api/v1/runtime/llm-profiles/import-env
```

The response includes the created profile plus any planner / delegate / delegate\_small bindings inferred from `DSPY_LM_MODEL`, `DSPY_DELEGATE_LM_MODEL`, and `DSPY_DELEGATE_LM_SMALL_MODEL`.

### Volume access boundaries

`GET /api/v1/runtime/volume/tree` and `GET /api/v1/runtime/volume/file` enforce explicit security boundaries against the canonical runtime volume roots returned in `allowed_roots`. Any path outside those roots returns `403` with `code: "forbidden"`.

`GET /api/v1/runtime/volume/tree` accepts:

| Parameter     | Default | Limit  | Description                                                                        |
| ------------- | ------- | ------ | ---------------------------------------------------------------------------------- |
| `max_depth`   | `3`     | —      | Maximum directory depth to traverse.                                               |
| `max_entries` | `200`   | `1000` | Maximum total node entries returned. Prevents oversized listings on large volumes. |

The response includes `max_depth`, `max_entries`, and `entries_returned` so clients can detect when limits clipped the listing.

`GET /api/v1/runtime/volume/file` previews include:

| Field       | Description                                                                                                     |
| ----------- | --------------------------------------------------------------------------------------------------------------- |
| `sha256`    | SHA-256 hex digest of the full file bytes before truncation.                                                    |
| `encoding`  | `utf-8` for clean text, `utf-8-lossy` when replacement characters were introduced, `binary` for non-text files. |
| `binary`    | `true` when the file is binary; `content` is empty in that case.                                                |
| `truncated` | `true` when the preview was cut off by the provider's size cap.                                                 |

Use `sha256` to deduplicate or verify cached previews without re-reading the file body.

## Sessions

| Endpoint                       | Method   | Purpose                                     |
| ------------------------------ | -------- | ------------------------------------------- |
| `/api/v1/sessions/state`       | `GET`    | Active session envelope                     |
| `/api/v1/sessions`             | `GET`    | List sessions                               |
| `/api/v1/sessions/{id}`        | `GET`    | Session detail                              |
| `/api/v1/sessions/{id}`        | `DELETE` | Delete a session                            |
| `/api/v1/sessions/{id}/turns`  | `GET`    | Conversation turns                          |
| `/api/v1/sessions/{id}/traces` | `GET`    | Paginated MLflow traces linked to a session |
| `/api/v1/sessions/{id}/export` | `POST`   | Export session manifest                     |

`GET /api/v1/sessions/{id}/traces` returns external traces (MLflow runs, including child delegations) associated with a session. Supports `limit` (default `50`, max `200`) and `offset` query parameters and returns `has_more` so the Workbench can paginate without re-counting.

## Sandboxes

Sandbox detail responses redact environment variables before returning them — the `env_vars` field on `SandboxResponse` only contains keys safe for display, not raw secrets.

## Optimization

| Endpoint                                              | Method         | Purpose                                                  |
| ----------------------------------------------------- | -------------- | -------------------------------------------------------- |
| `/api/v1/optimization/status`                         | `GET`          | Background runner status                                 |
| `/api/v1/optimization/run`                            | `POST`         | Trigger an optimization run                              |
| `/api/v1/optimization/modules`                        | `GET`          | List registered DSPy modules                             |
| `/api/v1/optimization/runs`                           | `GET` / `POST` | List or create runs                                      |
| `/api/v1/optimization/runs/{run_id}`                  | `GET`          | Single run detail                                        |
| `/api/v1/optimization/runs/{run_id}/results`          | `GET`          | Run results                                              |
| `/api/v1/optimization/runs/{run_id}/promotion-drafts` | `POST`         | Create or load a tenant-scoped promotion draft for a run |
| `/api/v1/optimization/runs/compare`                   | `GET`          | Baseline vs optimized comparison                         |
| `/api/v1/optimization/datasets`                       | `GET` / `POST` | List or upload datasets                                  |
| `/api/v1/optimization/datasets/{dataset_id}`          | `GET`          | Dataset detail                                           |

Each entry in `GET /api/v1/optimization/modules` carries an `offline_only` flag (default `true`) indicating that the module can only be optimized through the offline optimization endpoints — not from live request traffic.

### Optimization request shape

`POST /api/v1/optimization/runs` (async) and `POST /api/v1/optimization/run` (blocking) share the same request schema. Specify exactly one optimization target:

| Field          | Description                                                           |
| -------------- | --------------------------------------------------------------------- |
| `module_slug`  | Registered module slug. Resolves `program_spec` automatically.        |
| `program_spec` | Ad-hoc `module:attr` callable returning a `dspy.Module`.              |
| `skill_name`   | Bundled or mounted Fleet skill name. Optimizes the SKILL.md artifact. |
| `skill_path`   | Relative path to a SKILL.md-compatible markdown file.                 |

Common fields:

| Field                                           | Default   | Description                                                                      |
| ----------------------------------------------- | --------- | -------------------------------------------------------------------------------- |
| `dataset_id` / `dataset_path`                   | —         | Registered dataset id, or a path inside `OPTIMIZATION_DATA_ROOT`.                |
| `auto`                                          | `"light"` | Optimization intensity: `light`, `medium`, or `heavy`.                           |
| `train_ratio`                                   | `0.8`     | Train/validation split ratio.                                                    |
| `output_path`                                   | —         | Optional path to save the optimized artifact.                                    |
| `trace_bundle_paths`                            | `[]`      | Distilled trace bundle paths passed to the RLM-GEPA instruction proposer.        |
| `reflection_profile_id` / `reflection_model_id` | —         | Optional LLM profile + model for the proposer/reflection step. Provide together. |
| `optimizer`                                     | `"gepa"`  | Only `gepa` is supported.                                                        |

```bash theme={null}
# Skill optimization with offline trace context
curl -X POST http://localhost:8000/api/v1/optimization/runs \
  -H "Content-Type: application/json" \
  -d '{
    "skill_name": "optimization",
    "dataset_path": "skill_cases.jsonl",
    "trace_bundle_paths": ["artifacts/traces/optimization-failures.jsonl"],
    "auto": "medium"
  }'
```

<Warning>
  Skill optimization runs the RLM-GEPA instruction proposer inside a Daytona sandbox. The endpoint returns `503` with code `service_unavailable` when `DAYTONA_API_KEY` is not configured. Registered module optimization has no Daytona requirement.
</Warning>

### Promotion drafts

`POST /api/v1/optimization/runs/{run_id}/promotion-drafts` creates (or loads, if it already exists) a non-mutating draft promotion record for the targeted run. Drafts are scoped to the caller's tenant and workspace and serve as the review handoff between an optimization run and an explicit promotion decision. The endpoint does not change the live runtime — it only stages the optimized artifact for review.

## Traces

### `POST /api/v1/traces/feedback`

Submit thumbs-up / thumbs-down feedback for an MLflow trace. Used by the Workbench feedback UI.

## WebSocket

Two WebSocket endpoints power the live UI. Both require the same auth as HTTP endpoints when `AUTH_REQUIRED=true`.

| Endpoint                      | Stream                                                                   |
| ----------------------------- | ------------------------------------------------------------------------ |
| `/api/v1/ws/execution`        | Chat stream events — user-facing turn-taking                             |
| `/api/v1/ws/execution/events` | Execution graph events — tool calls, sandbox steps, recursive delegation |

Behind a reverse proxy, ensure HTTP/1.1 upgrade is supported and response buffering is disabled.

## Source of truth

When this page disagrees with the code, trust:

* `src/fleet_rlm/api/main.py` — app factory and route mounting.
* `src/fleet_rlm/api/routers/` — individual routers.
* [`openapi.yaml`](https://github.com/Qredence/fleet-rlm/blob/main/openapi.yaml) — generated schema.
