Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.qredence.ai/llms.txt

Use this file to discover all available pages before exploring further.

The chat agent does not directly hand a task to a child RLM. Delegation is mediated by a specific ReAct tool, delegate_to_rlm, registered the same way as any other tool in the agent’s tool registry.

Delegation flow

User prompt

FleetAgent  (dspy.ReAct, host LLM)
   │   decides the task exceeds one context and picks the tool:

delegate_to_rlm(query, context="", document_url="")
   │   — src/fleet_rlm/runtime/tools/rlm_delegate.py
   │   — reads the active Daytona interpreter from a ContextVar
   │   — checks remaining LLM-call budget; returns error if exhausted
   │   — interpreter.build_delegate_child()   ← isolated child Daytona sandbox
   │   — optionally fetches document_url into the child's context

build_recursive_subquery_rlm(
    interpreter=child,
    max_iterations=min(child.rlm_max_iterations, remaining_budget),
    max_llm_calls=remaining_budget,
)
   │   constructs the dspy.RLM bound to the child sandbox

rlm(prompt=query, context=...)
   │   child RLM runs REPL-variable-mode: may call llm_query(),
   │   sub_rlm(), sub_rlm_batched() to recurse further inside its sandbox

{"status": "ok", "answer": "..."}        ← bubbles back into the ReAct trace

Two entry points, one budget

Recursive RLM work has two entry points, and they share one budget:
  1. delegate_to_rlm() — from the host ReAct agent’s tool registry.
  2. sub_rlm() / sub_rlm_batched() — from Python code already running inside a dspy.RLM sandbox, reaching back out through the Daytona bridge to spawn a further child.
Both go through DaytonaInterpreter.build_delegate_child() so child creation follows one backend-owned policy. rlm_max_llm_calls is a single shared semantic-call budget across the entire recursive tree. sub_rlm_batched() caps sibling parallelism at 4 while sharing that same budget. Sandbox code can call llm_query(), llm_query_batched(), sub_rlm(), and sub_rlm_batched() through the Daytona bridge. These callbacks dispatch to fleet-rlm’s interpreter methods, not DSPy’s per-forward injected counters — that is why budget enforcement is global rather than per-frame.

Child isolation policy

The default is RLM_CHILD_ISOLATION_MODE=auto:
  • If the parent has no durable mounted volume, fork the parent Daytona sandbox into a child sandbox.
  • If a durable volume is mounted, create a clean child Daytona sandbox with the same repo_url, repo_ref, and context_paths, plus a child-specific volume_subpath.
  • If fork creation fails and RLM_CHILD_FORK_FALLBACK=clean, retry with a clean child sandbox.
  • Delete child sandboxes after each recursive task.
RLM_CHILD_ISOLATION_MODE=context is retained only as a backend/local debugging opt-out. It preserves the previous same-sandbox fresh-context behavior and should not be treated as the production isolation contract.
Child outputs return through the RLM answer. Child files and artifacts are not promoted to the parent automatically.

Local workspace snapshot fallback

When the parent turn is analyzing a local host checkout and no repo_url is available to recreate that checkout in a clean child sandbox, delegate_to_rlm():
  1. Writes a bounded text snapshot of relevant local repository files into the child sandbox under artifacts/rlm-inputs/local_workspace_snapshot.txt.
  2. Adds that path to the child context.
This preserves child-sandbox isolation while giving the child enough explicit evidence to inspect local code.

Configuration

VariableDefaultPurpose
RLM_CHILD_ISOLATION_MODEautoauto, clean, or context (debug only)
RLM_CHILD_FORK_FALLBACKcleanBehavior when fork creation fails
rlm_max_iterationsruntime defaultPer-RLM iteration cap
rlm_max_llm_callsruntime defaultTree-wide semantic call budget

See also