Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.qredence.ai/llms.txt

Use this file to discover all available pages before exploring further.

fleet-rlm exposes four overlapping observability surfaces. Pick the one matched to what you need to see.
SurfaceWhat it showsWhen to use
MLflow tracingDSPy program traces — spans, prompts, model callsDebugging program logic, GEPA evaluation, prompt regressions
WebSocket execution eventsLive tool calls, sandbox steps, recursive delegationLive UI, real-time UX
Runtime status / diagnosticsLM + Daytona connectivity, readinessProbing health, smoke validation
PostHog (optional)Product analyticsProduct usage telemetry

MLflow tracing

When MLFLOW_ENABLED=true (default), every DSPy call emits a trace to MLFLOW_TRACKING_URI under the MLFLOW_EXPERIMENT experiment (default: fleet-rlm).

Local auto-start

In APP_ENV=local, the API server auto-starts a localhost MLflow target on port 5001 unless MLFLOW_AUTO_START=false. To run it explicitly:
make mlflow-server
This keeps MLFLOW_TRACKING_URI=http://127.0.0.1:5001 aligned with the make mlflow-server target.

What ends up in a trace

Each user turn produces one MLflow trace covering:
  • The top-level FleetAgent.respond(...) call.
  • All ReAct iterations — thought, action, observation.
  • Every tool invocation, including delegate_to_rlm.
  • Recursive child RLM runs as nested spans.
  • All llm_query / sub_rlm callbacks from inside child sandboxes.
Because the host-callback bridge is the only path from sandbox to host, trace continuity is preserved across recursion — a child RLM’s calls show up as nested spans under their parent, not as orphan traces.

Wiring location

FileRole
integrations/observability/mlflow_runtime.pyMLflow client + tracing setup
integrations/observability/mlflow_traces.pyTrace context propagation
integrations/observability/trace_context.pyCross-thread/async trace context
runtime/quality/mlflow_evaluation.pyDSPy evaluation under MLflow
runtime/quality/mlflow_optimization.pyGEPA optimization runs under MLflow

Trace feedback

The Workbench supports thumbs-up / thumbs-down feedback per turn. Feedback is submitted via:
POST /api/v1/traces/feedback
The feedback is attached to the corresponding MLflow trace, so GEPA optimization runs can use it as a label signal downstream.

WebSocket execution events

Two WebSocket endpoints power the live UI:
EndpointStream
/api/v1/ws/executionChat stream events — user-facing turn-taking
/api/v1/ws/execution/eventsExecution graph events — tool calls, sandbox steps, recursive delegation
Both require the same auth as HTTP endpoints when AUTH_REQUIRED=true.

Event shaping

Live UI events are shaped by src/fleet_rlm/api/events/events.py and src/fleet_rlm/runtime/execution/streaming_events.py. Each event is a JSON envelope:
  • event — the streamed event frame.
  • command_result — tool/command outcomes.
  • error — error envelopes.
  • Execution stream frames carry sandbox steps and recursive-delegation events.
The two endpoints serve different concerns:
  • Chat stream is bidirectional — clients send user messages and receive assistant frames.
  • Execution stream is read-only — clients subscribe to receive artifact and execution events as they arrive from the chat runtime. Useful for opening a second UI surface (e.g., a workspace canvas) on the same session.

Reverse-proxy requirements

Behind a reverse proxy:
  • Upgrade HTTP/1.1 connections.
  • Disable response buffering.
  • Forward the auth bearer token through.
Misconfigured proxies are the most common reason for “connected but no events” in production.

Runtime status and diagnostics

The runtime exposes structured health and connectivity probes.

Health

EndpointAuthPurpose
GET /healthNoneLiveness — {"ok": true, "version": "..."}
GET /readyNoneReadiness — composite component status
Sample readiness response:
{
  "ready": true,
  "planner_configured": true,
  "planner": "ready",
  "database": "ready",
  "database_required": true,
  "sandbox_provider": "daytona"
}

Runtime status

GET /api/v1/runtime/status returns a composite snapshot:
{
  "app_env": "local",
  "write_enabled": true,
  "ready": true,
  "sandbox_provider": "daytona",
  "active_models": {
    "planner": "openai/gpt-4o",
    "delegate": "openai/gpt-4o-mini",
    "delegate_small": ""
  },
  "llm": { "model_set": true, "api_key_set": true, "planner_configured": true },
  "mlflow": { "enabled": false, "startup_status": "pending", "startup_error": null },
  "daytona": { "api_key_set": true, "api_url_set": true, "target_set": true },
  "tests": {
    "lm": { "ok": true, "latency_ms": 850 },
    "daytona": { "ok": true, "latency_ms": 640 }
  },
  "guidance": []
}

Connectivity probes

EndpointTests
POST /api/v1/runtime/tests/lmPlanner LM round-trip
POST /api/v1/runtime/tests/daytonaDaytona credentials and lifecycle
Both return the latency and a preflight envelope, and they cache results into the runtime status payload.

Daytona smoke

For sandbox-level validation without invoking an LM:
uv run fleet-rlm daytona-smoke \
  --repo https://github.com/Qredence/fleet-rlm.git \
  --ref main
This exercises credentials, network, sandbox lifecycle, repo clone, volume mount, and basic execution end-to-end.

PostHog (optional)

VariableDefaultDescription
POSTHOG_ENABLEDfalseEnable PostHog analytics
POSTHOG_HOSThttps://eu.i.posthog.comPostHog host
When enabled, the runtime emits product-analytics events through integrations/observability/posthog_callback.py. This is not a tracing replacement — it is for product usage signals.

Reading order

When you need to debug a misbehaving turn:
  1. Open the MLflow trace for the turn — it has the full DSPy program execution.
  2. If sandbox steps look wrong, check the execution-stream WebSocket frames for the same turn.
  3. If startup looks wrong, hit /ready and /api/v1/runtime/status.
  4. If Daytona is suspect, run uv run fleet-rlm daytona-smoke.

See also