fleet-rlm is a transport and orchestration shell around DSPy. Both agent layers — the chat surface and the recursive engine — are real dspy.* modules. fleet-rlm pins DSPy to 3.3.0b1 and uses the upstream RLM and SandboxSerializable contracts directly — no local monkeypatches.
Where DSPy lives in the codebase
| Concern | DSPy module | fleet-rlm location |
|---|
| Chat orchestration | dspy.ReAct | src/fleet_rlm/runtime/agent/agent.py (FleetAgent) |
| Recursive long-context | dspy.RLM | src/fleet_rlm/runtime/models/builders.py (build_recursive_subquery_rlm) |
| Tool delegation | ReAct tool | src/fleet_rlm/runtime/tools/rlm_delegate.py |
| Offline optimization | dspy.GEPA | src/fleet_rlm/runtime/quality/* |
The planner LM drives both ReAct and RLM. Configure it once via environment variables — fleet-rlm wires it into DSPy automatically.
DSPY_LM_MODEL=openai/gpt-4o-mini
DSPY_LLM_API_KEY=sk-...
# Optional LiteLLM proxy:
# DSPY_LM_API_BASE=https://your-litellm-proxy/v1
Any LiteLLM-supported model works. See the LiteLLM models reference for the curated list.
Use the chat agent from Python
from fleet_rlm.runtime.factory import build_runtime
runtime = build_runtime() # builds AgentRuntime + Daytona interpreter
agent = runtime.agent # the dspy.ReAct-backed FleetAgent
response = agent.respond(
"Summarize the architecture of this repo",
document_paths=["README.md"],
)
print(response.text)
AgentRuntime owns:
- The DSPy LM configuration.
- The Daytona interpreter instance.
- Conversation history (
dspy.History).
- Loaded document paths and core memory.
Build a recursive sub-query RLM
When you need a bounded recursive runner manually:
from fleet_rlm.runtime.models.builders import build_recursive_subquery_rlm
rlm = build_recursive_subquery_rlm(
interpreter=child_interpreter,
max_iterations=8,
max_llm_calls=40,
)
result = rlm(prompt="Extract every TODO from these logs", context="...")
In production code, prefer the delegate_to_rlm tool from inside the ReAct agent — it enforces the shared call budget for you. See Recursive RLM for the full delegation flow.
Long documents, workspace bundles, and other payloads that would blow past the planner LM’s context belong on the sandbox side, not in the prompt. fleet-rlm ships two ready-to-use SandboxSerializable models for that case from fleet_rlm.runtime.sandbox_types:
| Model | What lands in the REPL |
|---|
LargeDocument | A dict with text, source_url, and metadata keys. |
WorkspaceContext | A dict with document_text, context_paths, manifest, and metadata keys. |
DSPy injects these into the sandbox natively — the LM sees only a short rlm_preview() summary while the full value is reconstructed inside the REPL through sandbox_assignment(). Declare them on signature input fields just like any other typed input:
import dspy
from fleet_rlm.runtime.sandbox_types import LargeDocument
class SummarizeLongDoc(dspy.Signature):
"""Summarize a long document with a controllable focus."""
document: LargeDocument = dspy.InputField(desc="Source document to summarize")
focus: str = dspy.InputField(desc="Topic or aspect to focus on")
response: str = dspy.OutputField(desc="Final summary")
doc = LargeDocument(
text=open("incident-report.md").read(),
source_url="incident-report.md",
)
result = rlm(document=doc, focus="root cause")
print(result.response)
Custom signatures previously written against Fleet-maintained variable-mode wrappers should migrate to these types — the wrappers were removed in 0.5.50.
Offline GEPA optimization
fleet-rlm’s optimization layer registers DSPy modules and runs them through a single pipeline: dataset → dspy.Example → GEPA → dspy.Evaluate → saved artifact + manifest. GEPA is the only supported optimizer.
List registered modules:
uv run fleet-rlm optimize list
Run a default GEPA optimization:
uv run fleet-rlm optimize my-module dataset.jsonl \
--auto medium \
--output-path ./artifacts/optimized.pkl \
--report
Optimize a Fleet skill
Beyond registered DSPy modules, you can optimize a Fleet skill — a SKILL.md-compatible markdown artifact — using the RLM-GEPA instruction proposer. Provide either a bundled --skill-name or a --skill-path to a markdown file, and optionally pass distilled offline trace bundles for proposer context:
uv run fleet-rlm optimize skill data.jsonl \
--skill-name optimization \
--trace-bundle-path artifacts/traces/optimization-failures.jsonl \
--auto medium \
--report
Skill optimization runs the RLM-GEPA proposer inside an isolated Daytona sandbox and requires DAYTONA_API_KEY. The async API returns 503 if Daytona is not configured. Registered module optimization runs locally and has no Daytona requirement.
LongCoT reasoner
The optimization registry includes longcot-reasoner — a LongCoT reasoning module evaluated with a continuous answer-dominant 0.6/0.4 metric and tiered feedback. A one-time offline GEPA optimization was run against an 80-row answered-only LongCoT dataset (64 train / 16 validation), producing a reviewable optimized artifact bundle.
The pipeline persists:
- Baseline-vs-optimized holdout evidence.
- Prompt snapshots.
- Reflection-model provenance.
- MLflow metadata.
Optimization workflow in the web app
The /app/optimization surface in the Web UI exposes the same pipeline as the CLI through two tabs:
- New Run — module picker that auto-populates the program spec and shows required dataset keys. Select a registered module, choose an existing/uploaded/path dataset, optionally attach distilled trace bundle paths, and start an async run.
- Run History — lists every run with status badges and a detail panel showing program spec, intensity, train ratio, dataset, phase, duration, validation score, and any error. The list polls automatically while runs are active.
Submitting a run calls POST /api/v1/optimization/runs and returns immediately; the run executes in the background and is recovered as failed with a “Server restarted” message if the server restarts mid-run.
Optimized artifacts are saved for manual review and are not auto-loaded into the live runtime. To stage a candidate for review, create a tenant-scoped promotion draft from the run:
curl -X POST http://localhost:8000/api/v1/optimization/runs/<run_id>/promotion-drafts
The draft is a non-mutating record scoped to your tenant and workspace. It does not change the live runtime — it captures the optimized artifact and metadata for a deliberate promotion decision.
MLflow tracing
When MLFLOW_ENABLED=true (default), every DSPy call emits a trace to MLFLOW_TRACKING_URI under the MLFLOW_EXPERIMENT experiment (default: fleet-rlm).
In APP_ENV=local, the API server auto-starts a localhost MLflow target on port 5001 unless you set MLFLOW_AUTO_START=false. To run it yourself:
Trace feedback is submitted via POST /api/v1/traces/feedback.
See also