DSPy integration

fleet-rlm is a transport and orchestration shell around DSPy. Both agent layers — the chat surface and the recursive engine — are real dspy.* modules.

Where DSPy lives in the codebase

Concern	DSPy module	fleet-rlm location
Chat orchestration	`dspy.ReAct`	`src/fleet_rlm/runtime/agent/agent.py` (`FleetAgent`)
Recursive long-context	`dspy.RLM`	`src/fleet_rlm/runtime/models/builders.py` (`build_recursive_subquery_rlm`)
Tool delegation	ReAct tool	`src/fleet_rlm/runtime/tools/rlm_delegate.py`
Offline optimization	`dspy.GEPA`	`src/fleet_rlm/runtime/quality/*`

Configure the planner LM

The planner LM drives both ReAct and RLM. Configure it once via environment variables — fleet-rlm wires it into DSPy automatically.

.env

DSPY_LM_MODEL=openai/gpt-4o-mini
DSPY_LLM_API_KEY=sk-...
# Optional LiteLLM proxy:
# DSPY_LM_API_BASE=https://your-litellm-proxy/v1

Any LiteLLM-supported model works. See the LiteLLM models reference for the curated list.

Use the chat agent from Python

from fleet_rlm.runtime.factory import build_runtime

runtime = build_runtime()        # builds AgentRuntime + Daytona interpreter
agent = runtime.agent             # the dspy.ReAct-backed FleetAgent

response = agent.respond(
    "Summarize the architecture of this repo",
    document_paths=["README.md"],
)
print(response.text)

AgentRuntime owns:

The DSPy LM configuration.
The Daytona interpreter instance.
Conversation history (dspy.History).
Loaded document paths and core memory.

Build a recursive sub-query RLM

When you need a bounded recursive runner manually:

from fleet_rlm.runtime.models.builders import build_recursive_subquery_rlm

rlm = build_recursive_subquery_rlm(
    interpreter=child_interpreter,
    max_iterations=8,
    max_llm_calls=40,
)

result = rlm(prompt="Extract every TODO from these logs", context="...")

In production code, prefer the delegate_to_rlm tool from inside the ReAct agent — it enforces the shared call budget for you. See Recursive RLM for the full delegation flow.

Offline GEPA optimization

fleet-rlm’s optimization layer registers DSPy modules and runs GEPA against them. List registered modules:

uv run fleet-rlm optimize list

Run optimization on a module:

uv run fleet-rlm optimize my-module dataset.jsonl \
  --auto medium \
  --output-path ./artifacts/optimized.pkl \
  --report

LongCoT reasoner

The optimization registry includes longcot-reasoner — a LongCoT reasoning module evaluated with a continuous answer-dominant 0.6/0.4 GEPA metric and tiered feedback. A one-time offline GEPA optimization was run against an 80-row answered-only LongCoT dataset (64 train / 16 validation), producing a reviewable optimized artifact bundle. The pipeline persists:

Baseline-vs-optimized holdout evidence.
Prompt snapshots.
Reflection-model provenance.
MLflow metadata.

The optimized artifact is saved for manual review and is not auto-loaded into the live runtime. Promote artifacts deliberately.

MLflow tracing

When MLFLOW_ENABLED=true (default), every DSPy call emits a trace to MLFLOW_TRACKING_URI under the MLFLOW_EXPERIMENT experiment (default: fleet-rlm). In APP_ENV=local, the API server auto-starts a localhost MLflow target on port 5001 unless you set MLFLOW_AUTO_START=false. To run it yourself:

make mlflow-server

Trace feedback is submitted via POST /api/v1/traces/feedback.

Qredence Plugins

Fleet Pi

Fleet-RLM

Where DSPy lives in the codebase

Configure the planner LM

Use the chat agent from Python

Build a recursive sub-query RLM

Offline GEPA optimization

LongCoT reasoner

MLflow tracing

See also

Qredence Plugins

Fleet Pi

Fleet-RLM

Documentation Index

​Where DSPy lives in the codebase

​Configure the planner LM

​Use the chat agent from Python

​Build a recursive sub-query RLM

​Offline GEPA optimization

​LongCoT reasoner

​MLflow tracing

​See also

Where DSPy lives in the codebase

Configure the planner LM

Use the chat agent from Python

Build a recursive sub-query RLM

Offline GEPA optimization

LongCoT reasoner

MLflow tracing

See also