Documentation Index
Fetch the complete documentation index at: https://docs.qredence.ai/llms.txt
Use this file to discover all available pages before exploring further.
fleet-rlm exposes four overlapping observability surfaces. Pick the one matched to what you need to see.
| Surface | What it shows | When to use |
|---|
| MLflow tracing | DSPy program traces — spans, prompts, model calls | Debugging program logic, GEPA evaluation, prompt regressions |
| WebSocket execution events | Live tool calls, sandbox steps, recursive delegation | Live UI, real-time UX |
| Runtime status / diagnostics | LM + Daytona connectivity, readiness | Probing health, smoke validation |
| PostHog (optional) | Product analytics | Product usage telemetry |
MLflow tracing
When MLFLOW_ENABLED=true (default), every DSPy call emits a trace to MLFLOW_TRACKING_URI under the MLFLOW_EXPERIMENT experiment (default: fleet-rlm).
Local auto-start
In APP_ENV=local, the API server auto-starts a localhost MLflow target on port 5001 unless MLFLOW_AUTO_START=false. To run it explicitly:
This keeps MLFLOW_TRACKING_URI=http://127.0.0.1:5001 aligned with the make mlflow-server target.
What ends up in a trace
Each user turn produces one MLflow trace covering:
- The top-level
FleetAgent.respond(...) call.
- All ReAct iterations — thought, action, observation.
- Every tool invocation, including
delegate_to_rlm.
- Recursive child RLM runs as nested spans.
- All
llm_query / sub_rlm callbacks from inside child sandboxes.
Because the host-callback bridge is the only path from sandbox to host, trace continuity is preserved across recursion — a child RLM’s calls show up as nested spans under their parent, not as orphan traces.
Wiring location
| File | Role |
|---|
integrations/observability/mlflow_runtime.py | MLflow client + tracing setup |
integrations/observability/mlflow_traces.py | Trace context propagation |
integrations/observability/trace_context.py | Cross-thread/async trace context |
runtime/quality/mlflow_evaluation.py | DSPy evaluation under MLflow |
runtime/quality/mlflow_optimization.py | GEPA optimization runs under MLflow |
Trace feedback
The Workbench supports thumbs-up / thumbs-down feedback per turn. Feedback is submitted via:
POST /api/v1/traces/feedback
The feedback is attached to the corresponding MLflow trace, so GEPA optimization runs can use it as a label signal downstream.
WebSocket execution events
Two WebSocket endpoints power the live UI:
| Endpoint | Stream |
|---|
/api/v1/ws/execution | Chat stream events — user-facing turn-taking |
/api/v1/ws/execution/events | Execution graph events — tool calls, sandbox steps, recursive delegation |
Both require the same auth as HTTP endpoints when AUTH_REQUIRED=true.
Event shaping
Live UI events are shaped by src/fleet_rlm/api/events/events.py and src/fleet_rlm/runtime/execution/streaming_events.py. Each event is a JSON envelope:
event — the streamed event frame.
command_result — tool/command outcomes.
error — error envelopes.
- Execution stream frames carry sandbox steps and recursive-delegation events.
The two endpoints serve different concerns:
- Chat stream is bidirectional — clients send user messages and receive assistant frames.
- Execution stream is read-only — clients subscribe to receive artifact and execution events as they arrive from the chat runtime. Useful for opening a second UI surface (e.g., a workspace canvas) on the same session.
Reverse-proxy requirements
Behind a reverse proxy:
- Upgrade HTTP/1.1 connections.
- Disable response buffering.
- Forward the auth bearer token through.
Misconfigured proxies are the most common reason for “connected but no events” in production.
Runtime status and diagnostics
The runtime exposes structured health and connectivity probes.
Health
| Endpoint | Auth | Purpose |
|---|
GET /health | None | Liveness — {"ok": true, "version": "..."} |
GET /ready | None | Readiness — composite component status |
Sample readiness response:
{
"ready": true,
"planner_configured": true,
"planner": "ready",
"database": "ready",
"database_required": true,
"sandbox_provider": "daytona"
}
Runtime status
GET /api/v1/runtime/status returns a composite snapshot:
{
"app_env": "local",
"write_enabled": true,
"ready": true,
"sandbox_provider": "daytona",
"active_models": {
"planner": "openai/gpt-4o",
"delegate": "openai/gpt-4o-mini",
"delegate_small": ""
},
"llm": { "model_set": true, "api_key_set": true, "planner_configured": true },
"mlflow": { "enabled": false, "startup_status": "pending", "startup_error": null },
"daytona": { "api_key_set": true, "api_url_set": true, "target_set": true },
"tests": {
"lm": { "ok": true, "latency_ms": 850 },
"daytona": { "ok": true, "latency_ms": 640 }
},
"guidance": []
}
Connectivity probes
| Endpoint | Tests |
|---|
POST /api/v1/runtime/tests/lm | Planner LM round-trip |
POST /api/v1/runtime/tests/daytona | Daytona credentials and lifecycle |
Both return the latency and a preflight envelope, and they cache results into the runtime status payload.
Daytona smoke
For sandbox-level validation without invoking an LM:
uv run fleet-rlm daytona-smoke \
--repo https://github.com/Qredence/fleet-rlm.git \
--ref main
This exercises credentials, network, sandbox lifecycle, repo clone, volume mount, and basic execution end-to-end.
PostHog (optional)
| Variable | Default | Description |
|---|
POSTHOG_ENABLED | false | Enable PostHog analytics |
POSTHOG_HOST | https://eu.i.posthog.com | PostHog host |
When enabled, the runtime emits product-analytics events through integrations/observability/posthog_callback.py. This is not a tracing replacement — it is for product usage signals.
Reading order
When you need to debug a misbehaving turn:
- Open the MLflow trace for the turn — it has the full DSPy program execution.
- If sandbox steps look wrong, check the execution-stream WebSocket frames for the same turn.
- If startup looks wrong, hit
/ready and /api/v1/runtime/status.
- If Daytona is suspect, run
uv run fleet-rlm daytona-smoke.
See also