Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.qredence.ai/llms.txt

Use this file to discover all available pages before exploring further.

fleet-rlm is a transport and orchestration shell around DSPy. Both agent layers — the chat surface and the recursive engine — are real dspy.* modules.

Where DSPy lives in the codebase

ConcernDSPy modulefleet-rlm location
Chat orchestrationdspy.ReActsrc/fleet_rlm/runtime/agent/agent.py (FleetAgent)
Recursive long-contextdspy.RLMsrc/fleet_rlm/runtime/models/builders.py (build_recursive_subquery_rlm)
Tool delegationReAct toolsrc/fleet_rlm/runtime/tools/rlm_delegate.py
Offline optimizationdspy.GEPAsrc/fleet_rlm/runtime/quality/*

Configure the planner LM

The planner LM drives both ReAct and RLM. Configure it once via environment variables — fleet-rlm wires it into DSPy automatically.
.env
DSPY_LM_MODEL=openai/gpt-4o-mini
DSPY_LLM_API_KEY=sk-...
# Optional LiteLLM proxy:
# DSPY_LM_API_BASE=https://your-litellm-proxy/v1
Any LiteLLM-supported model works. See the LiteLLM models reference for the curated list.

Use the chat agent from Python

from fleet_rlm.runtime.factory import build_runtime

runtime = build_runtime()        # builds AgentRuntime + Daytona interpreter
agent = runtime.agent             # the dspy.ReAct-backed FleetAgent

response = agent.respond(
    "Summarize the architecture of this repo",
    document_paths=["README.md"],
)
print(response.text)
AgentRuntime owns:
  • The DSPy LM configuration.
  • The Daytona interpreter instance.
  • Conversation history (dspy.History).
  • Loaded document paths and core memory.

Build a recursive sub-query RLM

When you need a bounded recursive runner manually:
from fleet_rlm.runtime.models.builders import build_recursive_subquery_rlm

rlm = build_recursive_subquery_rlm(
    interpreter=child_interpreter,
    max_iterations=8,
    max_llm_calls=40,
)

result = rlm(prompt="Extract every TODO from these logs", context="...")
In production code, prefer the delegate_to_rlm tool from inside the ReAct agent — it enforces the shared call budget for you. See Recursive RLM for the full delegation flow.

Offline GEPA optimization

fleet-rlm’s optimization layer registers DSPy modules and runs GEPA against them. List registered modules:
uv run fleet-rlm optimize list
Run optimization on a module:
uv run fleet-rlm optimize my-module dataset.jsonl \
  --auto medium \
  --output-path ./artifacts/optimized.pkl \
  --report

LongCoT reasoner

The optimization registry includes longcot-reasoner — a LongCoT reasoning module evaluated with a continuous answer-dominant 0.6/0.4 GEPA metric and tiered feedback. A one-time offline GEPA optimization was run against an 80-row answered-only LongCoT dataset (64 train / 16 validation), producing a reviewable optimized artifact bundle. The pipeline persists:
  • Baseline-vs-optimized holdout evidence.
  • Prompt snapshots.
  • Reflection-model provenance.
  • MLflow metadata.
The optimized artifact is saved for manual review and is not auto-loaded into the live runtime. Promote artifacts deliberately.

MLflow tracing

When MLFLOW_ENABLED=true (default), every DSPy call emits a trace to MLFLOW_TRACKING_URI under the MLFLOW_EXPERIMENT experiment (default: fleet-rlm). In APP_ENV=local, the API server auto-starts a localhost MLflow target on port 5001 unless you set MLFLOW_AUTO_START=false. To run it yourself:
make mlflow-server
Trace feedback is submitted via POST /api/v1/traces/feedback.

See also