> ## Documentation Index
> Fetch the complete documentation index at: https://docs.qredence.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# fleet-rlm architecture

> Three layers of fleet-rlm — a thin FastAPI transport, the dspy.ReAct and dspy.RLM runtime core, and the Daytona sandbox substrate — with their boundaries.

`fleet-rlm` is a Daytona-backed recursive runtime wrapped by a thin transport shell. Three intentional design choices drive the shape of the codebase:

1. **The backend is intentionally thin.** The Python layer is a transport and orchestration shell over `dspy.ReAct`, `dspy.RLM`, and Daytona sandboxes. Intelligence lives in DSPy (upstream) and in the recursive scheduling policy (this repo). Plumbing belongs in `src/fleet_rlm/api/`; runtime policy belongs in `src/fleet_rlm/runtime/`.
2. **The UI is treated as core, not peripheral.** The runtime emits streaming events, code-execution results, and artifacts that only make sense in an interactive surface. `src/frontend/` is comparable in line count to `src/fleet_rlm/` because it is surfacing work the runtime is already doing, not duplicating it.
3. **Two agent layers, both `dspy.*`, both real.**
   * **Chat surface** — `dspy.ReAct` at `src/fleet_rlm/runtime/agent/agent.py` (`FleetAgent`).
   * **Recursive engine** — `dspy.RLM` at `src/fleet_rlm/runtime/models/builders.py` with delegation at `src/fleet_rlm/runtime/tools/rlm_delegate.py`.

## High-level layering

```mermaid theme={null}
graph TB
    CLIENTS["CLI / Web UI / WebSocket clients"] --> API["FastAPI transport<br/>api/main.py · api/routers/* · api/runtime_services/*"]
    API --> RUNTIME["Runtime core<br/>runtime/agent · runtime/execution · runtime/models · runtime/tools"]
    RUNTIME --> DAYTONA["Daytona substrate<br/>integrations/daytona/*"]
    API --> EVENTS["Event shaping<br/>api/events/*"]
    API --> PERSIST["Persistence<br/>integrations/local_store.py · integrations/database/*"]
    RUNTIME --> QUALITY["Offline quality<br/>runtime/quality/*"]
    RUNTIME --> OBS["Observability<br/>integrations/observability/*"]
```

The runtime core is the live center. The transport shell, persistence layer, and offline quality lane all attach to it; they do not replace it.

## Layer 1 — Transport shell

A thin FastAPI app with two WebSocket endpoints and a small set of REST routers.

**Primary files:**

| File                                       | Role                                                     |
| ------------------------------------------ | -------------------------------------------------------- |
| `api/main.py`                              | App factory, lifespan, route mounting, SPA asset serving |
| `api/bootstrap.py`                         | Critical startup wiring + optional warmup                |
| `api/routers/ws/endpoint.py`               | Two WebSocket surfaces: chat stream + execution events   |
| `api/runtime_services/chat_runtime.py`     | Per-turn runtime preparation                             |
| `api/runtime_services/chat_persistence.py` | Turn and session lifecycle writes                        |
| `api/runtime_services/diagnostics.py`      | Connectivity probes, status composition                  |
| `api/runtime_services/settings.py`         | Runtime settings read/mutate (local-only writes)         |
| `api/runtime_services/volumes.py`          | Daytona volume browsing                                  |

**Responsibilities:**

* App factory, lifespan, route mounting, and SPA asset serving (the SPA is served from `src/fleet_rlm/ui/dist` in published installs, or `src/frontend/dist` in source checkouts).
* Auth-derived HTTP and WebSocket identity (dev or Entra).
* Session lookup, runtime preparation, and service orchestration.
* WebSocket lifecycle and execution-event envelope delivery.
* Runtime settings, diagnostics, and Daytona volume browsing.

The transport layer should not contain business logic. If you find yourself reaching for DSPy primitives or Daytona SDK calls inside a router, it belongs one layer down.

## Layer 2 — Runtime core

The live cognition loop. This is where the ReAct agent runs, tools are dispatched, and per-turn execution events are assembled.

**Primary files:**

| File                          | Role                                                            |
| ----------------------------- | --------------------------------------------------------------- |
| `api/routers/ws/stream.py`    | Streaming turn execution coordination                           |
| `runtime/factory.py`          | Builds the canonical Daytona-backed chat agent                  |
| `runtime/agent/agent.py`      | `FleetAgent` — the `dspy.ReAct` chat module                     |
| `runtime/agent/runtime.py`    | `AgentRuntime` — wraps `FleetAgent` with sandbox + memory state |
| `runtime/agent/signatures.py` | DSPy input/output signatures                                    |
| `runtime/execution/*`         | Execution-event assembly, streaming helpers, citation tracking  |
| `runtime/models/builders.py`  | RLM construction (`build_recursive_subquery_rlm`)               |
| `runtime/models/registry.py`  | Runtime model registry                                          |
| `runtime/tools/*`             | Tool registry, including `rlm_delegate.py`                      |

**Responsibilities:**

* Shared chat and runtime execution (CLI, HTTP, WebSocket all converge here).
* Recursive delegation policy and tool execution.
* Execution-event assembly and workbench hydration inputs.
* Runtime model assembly and registry management.

`AgentRuntime` owns the per-session state envelope: the DSPy `LM` configuration, the active `DaytonaInterpreter` instance, the conversation `dspy.History`, loaded document paths, and core memory blocks.

## Layer 3 — Daytona substrate

The execution backend. Daytona is the maintained sandbox provider; the runtime contract is intentionally Daytona-only.

**Primary files:**

| File                                        | Role                                                                                           |
| ------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| `integrations/daytona/interpreter.py`       | Public `DaytonaInterpreter` facade                                                             |
| `integrations/daytona/workspace_manager.py` | Workspace config, session lifecycle, persistent state, import/export                           |
| `integrations/daytona/sandbox_executor.py`  | Code execution, sanitization, tool dispatch, result finalization                               |
| `integrations/daytona/isolation.py`         | Recursive child policy, host-mediated evidence persistence, and context staging                |
| `integrations/daytona/models.py`            | Sandbox specs, workspace config, staged-context records, smoke results, chat/session contracts |
| `integrations/daytona/runtime.py`           | Runtime facade around workspace bootstrap and session creation                                 |
| `integrations/daytona/workspace_runtime.py` | Workspace path, repo checkout, and session reconciliation helpers                              |
| `integrations/daytona/sdk_ops.py`           | Volume, snapshot, lifecycle, and lower-level Daytona SDK helpers                               |
| `integrations/daytona/bridge.py`            | Host-callback broker (sandbox → host)                                                          |
| `integrations/daytona/diagnostics.py`       | Structured diagnostics + smoke validation                                                      |
| `integrations/daytona/errors.py`            | Provider-local error types                                                                     |

**Responsibilities:**

* Sandbox and interpreter lifecycle (creation, resume, shutdown).
* Repo checkout (`sandbox.git.clone(...)`), workspace path staging, and durable mounted volumes.
* Persistent Python execution context via `sandbox.code_interpreter.create_context(...)` and `run_code(...)`.
* Provider-specific diagnostics and volume normalization.

The Fleet-facing provider contract is async-first. `DaytonaInterpreter.astart()`, `ashutdown()`, `aexecute()`, `aconfigure_workspace()`, `aimport_session_state()`, and the `DaytonaSandboxSession` file/lifecycle `a*` methods are real coroutines. Sync helpers remain as public compatibility shims for notebooks, tests, and direct Python API users.

See [Daytona runtime](/fleet-rlm/concepts/daytona-runtime) for the full substrate model: snapshots, volumes, session continuity, and the broker bridge.

## Layer 4 — Offline quality

DSPy evaluation, GEPA optimization, and offline scoring run in their own lane.

**Primary files:** `src/fleet_rlm/runtime/quality/*` — `dspy_evaluation.py`, `gepa_optimization.py`, `mlflow_evaluation.py`, `mlflow_optimization.py`, `workspace_metrics.py`, `scorers.py`, `module_registry.py`.

**Responsibilities:**

* DSPy evaluation against registered modules.
* GEPA optimization runs.
* Offline scoring, datasets, and module registry management.

Optimized artifacts are persisted for manual review; they are not auto-loaded into the live runtime. See [DSPy integration](/fleet-rlm/guides/dspy-integration) for the optimization workflow.

## Module map

```mermaid theme={null}
graph LR
    REQUEST["Workspace task request"] --> STREAM["api/routers/ws/stream.py"]
    STREAM --> FACTORY["runtime/factory.py"]
    FACTORY --> AGENT["runtime/agent/agent.py<br/>FleetAgent (dspy.ReAct)"]
    AGENT --> TOOLS["runtime/tools/*"]
    TOOLS --> DELEGATE["runtime/tools/rlm_delegate.py"]
    AGENT --> EXEC["runtime/execution/*"]
    EXEC --> INTERPRETER["integrations/daytona/interpreter.py"]
    DELEGATE --> ISOLATION["integrations/daytona/isolation.py"]
    INTERPRETER --> EXECUTOR["integrations/daytona/sandbox_executor.py"]
    EXECUTOR --> BRIDGE["integrations/daytona/bridge.py"]
    AGENT --> MODELS["runtime/models/*"]
    MODELS --> RLM["build_recursive_subquery_rlm()"]
```

## Stateful restore

Session manifests on durable Daytona storage are the authoritative restart-restore source. The manifest `state` payload restores:

* `dspy.History` conversation turns.
* `AgentRuntime` core memory — default core memory plus persisted keys.
* Session-local loaded document paths.
* Daytona interpreter state — sandbox ID, workspace path, repo URL/ref, context paths, volume name, volume subpath.

Importing a session **replaces** session-local memory and document state instead of merging into the active runtime. Empty or missing state resets history, core memory, loaded documents, and sandbox buffers so switching sessions cannot leak stale agent context.

Manifests live under `meta/workspaces/<workspace_id>/users/<user_id>/react-session-<session_id>.json` on the mounted Daytona volume.

## Reading order

When you need to understand the live backend, read in this order:

1. `src/fleet_rlm/api/main.py`
2. `src/fleet_rlm/api/routers/ws/endpoint.py`
3. `src/fleet_rlm/api/routers/ws/stream.py`
4. `src/fleet_rlm/runtime/factory.py`
5. `src/fleet_rlm/runtime/agent/agent.py`
6. `src/fleet_rlm/runtime/tools/rlm_delegate.py`
7. `src/fleet_rlm/integrations/daytona/interpreter.py`
8. `src/fleet_rlm/integrations/daytona/runtime.py`

## Source of truth

When the docs disagree with the code, trust the code and the generated contracts:

* Backend routes and WebSocket behavior: `src/fleet_rlm/api/`.
* Runtime cognition and Daytona execution: `src/fleet_rlm/runtime/` and `src/fleet_rlm/integrations/daytona/`.
* Canonical HTTP schema: [`openapi.yaml`](https://github.com/Qredence/fleet-rlm/blob/main/openapi.yaml).

<Note>
  Historical transition notes may mention `orchestration_app/` or `api/orchestration/`. Those labels are not part of the current tree.
</Note>