> ## Documentation Index
> Fetch the complete documentation index at: https://docs.qredence.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# DSPy integration in fleet-rlm: ReAct, RLM, and GEPA

> How dspy.ReAct, dspy.RLM, SandboxSerializable inputs, and offline GEPA module and skill optimization fit together in fleet-rlm, with code examples.

fleet-rlm is a **transport and orchestration shell** around DSPy. Both agent layers — the chat surface and the recursive engine — are real `dspy.*` modules. fleet-rlm pins DSPy to `3.3.0b1` and uses the upstream `RLM` and `SandboxSerializable` contracts directly — no local monkeypatches.

## Where DSPy lives in the codebase

| Concern                | DSPy module  | fleet-rlm location                                                          |
| ---------------------- | ------------ | --------------------------------------------------------------------------- |
| Chat orchestration     | `dspy.ReAct` | `src/fleet_rlm/runtime/agent/agent.py` (`FleetAgent`)                       |
| Recursive long-context | `dspy.RLM`   | `src/fleet_rlm/runtime/models/builders.py` (`build_recursive_subquery_rlm`) |
| Tool delegation        | ReAct tool   | `src/fleet_rlm/runtime/tools/rlm_delegate.py`                               |
| Offline optimization   | `dspy.GEPA`  | `src/fleet_rlm/runtime/quality/*`                                           |

## Configure the planner LM

The planner LM drives both ReAct and RLM. Configure it once via environment variables — fleet-rlm wires it into DSPy automatically.

```bash .env theme={null}
DSPY_LM_MODEL=openai/gpt-4o-mini
DSPY_LLM_API_KEY=sk-...
# Optional LiteLLM proxy:
# DSPY_LM_API_BASE=https://your-litellm-proxy/v1
```

Any [LiteLLM-supported model](https://docs.litellm.ai/docs/providers) works. See the [LiteLLM models reference](https://github.com/Qredence/fleet-rlm/blob/main/docs/reference/litellm-models.md) for the curated list.

## Use the chat agent from Python

```python theme={null}
from fleet_rlm.runtime.factory import build_runtime

runtime = build_runtime()        # builds AgentRuntime + Daytona interpreter
agent = runtime.agent             # the dspy.ReAct-backed FleetAgent

response = agent.respond(
    "Summarize the architecture of this repo",
    document_paths=["README.md"],
)
print(response.text)
```

`AgentRuntime` owns:

* The DSPy LM configuration.
* The Daytona interpreter instance.
* Conversation history (`dspy.History`).
* Loaded document paths and core memory.

## Build a recursive sub-query RLM

When you need a bounded recursive runner manually:

```python theme={null}
from fleet_rlm.runtime.models.builders import build_recursive_subquery_rlm

rlm = build_recursive_subquery_rlm(
    interpreter=child_interpreter,
    max_iterations=8,
    max_llm_calls=40,
)

result = rlm(prompt="Extract every TODO from these logs", context="...")
```

In production code, prefer the `delegate_to_rlm` tool from inside the ReAct agent — it enforces the shared call budget for you. See [Recursive RLM](/fleet-rlm/concepts/recursive-rlm) for the full delegation flow.

## Pass large inputs to `dspy.RLM`

Long documents, workspace bundles, and other payloads that would blow past the planner LM's context belong on the sandbox side, not in the prompt. fleet-rlm ships two ready-to-use `SandboxSerializable` models for that case from `fleet_rlm.runtime.sandbox_types`:

| Model              | What lands in the REPL                                                         |
| ------------------ | ------------------------------------------------------------------------------ |
| `LargeDocument`    | A dict with `text`, `source_url`, and `metadata` keys.                         |
| `WorkspaceContext` | A dict with `document_text`, `context_paths`, `manifest`, and `metadata` keys. |

DSPy injects these into the sandbox natively — the LM sees only a short `rlm_preview()` summary while the full value is reconstructed inside the REPL through `sandbox_assignment()`. Declare them on signature input fields just like any other typed input:

```python theme={null}
import dspy
from fleet_rlm.runtime.sandbox_types import LargeDocument

class SummarizeLongDoc(dspy.Signature):
    """Summarize a long document with a controllable focus."""

    document: LargeDocument = dspy.InputField(desc="Source document to summarize")
    focus: str = dspy.InputField(desc="Topic or aspect to focus on")
    response: str = dspy.OutputField(desc="Final summary")

doc = LargeDocument(
    text=open("incident-report.md").read(),
    source_url="incident-report.md",
)
result = rlm(document=doc, focus="root cause")
print(result.response)
```

Custom signatures previously written against Fleet-maintained variable-mode wrappers should migrate to these types — the wrappers were removed in 0.5.50.

## Offline GEPA optimization

fleet-rlm's optimization layer registers DSPy modules and runs them through a single pipeline: dataset → `dspy.Example` → GEPA → `dspy.Evaluate` → saved artifact + manifest. GEPA is the only supported optimizer.

List registered modules:

```bash theme={null}
uv run fleet-rlm optimize list
```

Run a default GEPA optimization:

```bash theme={null}
uv run fleet-rlm optimize my-module dataset.jsonl \
  --auto medium \
  --output-path ./artifacts/optimized.pkl \
  --report
```

### Optimize a Fleet skill

Beyond registered DSPy modules, you can optimize a Fleet skill — a SKILL.md-compatible markdown artifact — using the RLM-GEPA instruction proposer. Provide either a bundled `--skill-name` or a `--skill-path` to a markdown file, and optionally pass distilled offline trace bundles for proposer context:

```bash theme={null}
uv run fleet-rlm optimize skill data.jsonl \
  --skill-name optimization \
  --trace-bundle-path artifacts/traces/optimization-failures.jsonl \
  --auto medium \
  --report
```

<Warning>
  Skill optimization runs the RLM-GEPA proposer inside an isolated Daytona sandbox and requires `DAYTONA_API_KEY`. The async API returns `503` if Daytona is not configured. Registered module optimization runs locally and has no Daytona requirement.
</Warning>

### LongCoT reasoner

The optimization registry includes `longcot-reasoner` — a LongCoT reasoning module evaluated with a continuous answer-dominant 0.6/0.4 metric and tiered feedback. A one-time offline GEPA optimization was run against an 80-row answered-only LongCoT dataset (64 train / 16 validation), producing a reviewable optimized artifact bundle.

The pipeline persists:

* Baseline-vs-optimized holdout evidence.
* Prompt snapshots.
* Reflection-model provenance.
* MLflow metadata.

### Optimization workflow in the web app

The `/app/optimization` surface in the Web UI exposes the same pipeline as the CLI through two tabs:

* **New Run** — module picker that auto-populates the program spec and shows required dataset keys. Select a registered module, choose an existing/uploaded/path dataset, optionally attach distilled trace bundle paths, and start an async run.
* **Run History** — lists every run with status badges and a detail panel showing program spec, intensity, train ratio, dataset, phase, duration, validation score, and any error. The list polls automatically while runs are active.

Submitting a run calls `POST /api/v1/optimization/runs` and returns immediately; the run executes in the background and is recovered as `failed` with a "Server restarted" message if the server restarts mid-run.

### Promoting an optimized artifact

Optimized artifacts are saved for **manual review** and are **not** auto-loaded into the live runtime. To stage a candidate for review, create a tenant-scoped promotion draft from the run:

```bash theme={null}
curl -X POST http://localhost:8000/api/v1/optimization/runs/<run_id>/promotion-drafts
```

The draft is a non-mutating record scoped to your tenant and workspace. It does not change the live runtime — it captures the optimized artifact and metadata for a deliberate promotion decision.

## MLflow tracing

When `MLFLOW_ENABLED=true` (default), every DSPy call emits a trace to `MLFLOW_TRACKING_URI` under the `MLFLOW_EXPERIMENT` experiment (default: `fleet-rlm`).

In `APP_ENV=local`, the API server auto-starts a localhost MLflow target on port 5001 unless you set `MLFLOW_AUTO_START=false`. To run it yourself:

```bash theme={null}
make mlflow-server
```

Trace feedback is submitted via `POST /api/v1/traces/feedback`.

## See also

* [CLI reference](/fleet-rlm/reference/cli) — the `optimize`, `chat`, and `serve-api` subcommands.
* [Recursive RLM](/fleet-rlm/concepts/recursive-rlm) — delegation policy and shared call budget.
