delegate_to_rlm — that spawns an isolated child Daytona sandbox running a bounded dspy.RLM.
This is fleet-rlm’s implementation of Algorithm 1 from arXiv 2512.24601v2: inputs stored as REPL variables, sub-queries dispatched recursively, bounded by a shared call budget.
Delegation flow
DaytonaInterpreter from a ContextVar, checks the remaining LLM-call budget, builds a child sandbox via interpreter.build_delegate_child(), and constructs a fresh dspy.RLM bound to that child.
REPL variable mode
Inside the child sandbox, the RLM does not see the entire context as raw tokens. Following Algorithm 1, the inputs are stored as REPL variables in the child’s persistent Python execution context (sandbox.code_interpreter.create_context(...)). The RLM works through them programmatically:
- Long documents are bound to named variables in the REPL.
- The RLM emits Python that inspects, slices, and reduces those variables.
- Sub-queries that need an LLM call are dispatched via the host-callback bridge, not by inflating the prompt.
Since fleet-rlm
0.5.50, large-input injection rides on upstream DSPy’s native dspy.RLM SandboxSerializable contract (DSPy ≥ 3.3.0b1). Document and workspace context models implement SandboxSerializable, so REPL-variable mode is handled by DSPy directly rather than by Fleet-maintained variable-mode wrappers. The Fleet runtime keeps the private RLM hooks isolated in runtime/models/builders.py.Two entry points, one budget
Recursive RLM work has two entry points, and they share one budget:delegate_to_rlm()— from the hostFleetAgent’s ReAct tool registry. The chat agent calls this when it decides a sub-task exceeds its context.sub_rlm()/sub_rlm_batched()— from Python code already running inside adspy.RLMsandbox, reaching back out through the Daytona bridge to spawn a further child.
DaytonaInterpreter.build_delegate_child() so child creation follows one backend-owned policy.
rlm_max_llm_calls is a single shared semantic-call budget across the entire recursive tree — not per frame, not per RLM. sub_rlm_batched() caps sibling parallelism at 4 while sharing that same budget across siblings.
The reason the budget is global rather than per-frame is that the sandbox callbacks (
llm_query, sub_rlm, etc.) dispatch through fleet-rlm’s interpreter methods over the Daytona bridge — not through DSPy’s per-forward injected counters. Budget enforcement happens at the bridge boundary, which sees every recursive call regardless of depth.Sandbox callbacks
Code running inside a child RLM sandbox can call these primitives via the Daytona bridge:| Callback | Purpose |
|---|---|
llm_query(prompt, ...) | Single semantic LLM call against the host’s planner LM |
llm_query_batched(prompts, ...) | Batched semantic LLM calls (parallel) |
sub_rlm(query, context="") | Recurse further — spawn a new child RLM |
sub_rlm_batched(queries, contexts) | Batched recursion, capped at 4 sibling children |
SUBMIT(answer) | Final-artifact capture for the parent |
rlm_query and rlm_query_batched are agent-level recursive entrypoints; they are not sandbox callbacks. Sandbox-authored code should use llm_query and sub_rlm instead.
Child isolation policy
The default isRLM_CHILD_ISOLATION_MODE=auto:
| Condition | Behavior |
|---|---|
| Parent has no durable mounted volume | Fork the parent Daytona sandbox into a child sandbox |
| Durable volume is mounted | Create a clean child Daytona sandbox with the same repo_url, repo_ref, and context_paths, plus a child-specific volume_subpath |
Fork fails and RLM_CHILD_FORK_FALLBACK=clean | Retry with a clean child sandbox |
| After child run completes | Child sandbox is deleted |
memory/ or artifacts/ paths from inside the child.
Local workspace snapshot fallback
When the parent turn is analyzing a local host checkout and norepo_url is available to recreate that checkout in a clean child sandbox, delegate_to_rlm() falls back:
- Writes a bounded text snapshot of relevant local repository files into the child sandbox at
artifacts/rlm-inputs/local_workspace_snapshot.txt. - Adds that path to the child’s context.
Budget exhaustion
When the sharedrlm_max_llm_calls budget is exhausted, the delegation tool returns an error envelope instead of starting a new child:
sub_rlm() / llm_query() calls the same way.
Automatic routing to native RLM
When a turn runs withexecution_mode=auto, fleet-rlm previews the attached context — document path, repository, additional context paths, and inline content — and estimates its size in characters. If the estimate exceeds FLEET_RLM_LARGE_CONTEXT_THRESHOLD (default 32000), fleet-rlm routes the turn to the native RLM path instead of the ReAct chat agent. A clear URL-document task takes the same path.
Large inputs then ride on REPL-variable mode from the start instead of being inlined into the planner prompt. Set the threshold via environment variable:
.env
execution_mode=rlm or rlm_only) bypass the threshold and always route to native RLM.
Configuration
| Variable | Default | Purpose |
|---|---|---|
RLM_CHILD_ISOLATION_MODE | auto | auto, clean, or context (debug only) |
RLM_CHILD_FORK_FALLBACK | clean | Behavior when fork creation fails |
rlm_max_iterations | runtime default | Per-RLM iteration cap |
rlm_max_llm_calls | runtime default | Tree-wide semantic call budget |
FLEET_RLM_LARGE_CONTEXT_THRESHOLD | 32000 | Character threshold for auto-mode native-RLM routing |
Implementation pointers
When you need to read the code:src/fleet_rlm/runtime/tools/rlm_delegate.py— the delegation tool itself.src/fleet_rlm/runtime/models/builders.py—build_recursive_subquery_rlm().src/fleet_rlm/integrations/daytona/isolation.py— child sandbox creation hooks and isolation-mode policy (auto / clean / context).src/fleet_rlm/integrations/daytona/bridge.py— host-callback broker that dispatches sandbox callbacks.
See also
- Architecture — where the delegation tool fits in the layering.
- Daytona runtime — sandbox lifecycle, volumes, and the bridge.
- Configuration — environment variables.