Observability

fleet-rlm exposes four overlapping observability surfaces. Pick the one matched to what you need to see.

Surface	What it shows	When to use
MLflow tracing	DSPy program traces — spans, prompts, model calls	Debugging program logic, GEPA evaluation, prompt regressions
WebSocket execution events	Live tool calls, sandbox steps, recursive delegation	Live UI, real-time UX
Runtime status / diagnostics	LM + Daytona connectivity, readiness	Probing health, smoke validation
PostHog (optional)	Product analytics	Product usage telemetry

MLflow tracing

When MLFLOW_ENABLED=true (default), every DSPy call emits a trace to MLFLOW_TRACKING_URI under the MLFLOW_EXPERIMENT experiment (default: fleet-rlm).

Local auto-start

In APP_ENV=local, the API server auto-starts a localhost MLflow target on port 5001 unless MLFLOW_AUTO_START=false. To run it explicitly:

make mlflow-server

This keeps MLFLOW_TRACKING_URI=http://127.0.0.1:5001 aligned with the make mlflow-server target.

What ends up in a trace

Each user turn produces one MLflow trace covering:

The top-level FleetAgent.respond(...) call.
All ReAct iterations — thought, action, observation.
Every tool invocation, including delegate_to_rlm.
Recursive child RLM runs as nested spans.
All llm_query / sub_rlm callbacks from inside child sandboxes.

Because the host-callback bridge is the only path from sandbox to host, trace continuity is preserved across recursion — a child RLM’s calls show up as nested spans under their parent, not as orphan traces.

Wiring location

File	Role
`integrations/observability/mlflow_runtime.py`	MLflow client + tracing setup
`integrations/observability/mlflow_traces.py`	Trace context propagation
`integrations/observability/trace_context.py`	Cross-thread/async trace context
`runtime/quality/mlflow_evaluation.py`	DSPy evaluation under MLflow
`runtime/quality/mlflow_optimization.py`	GEPA optimization runs under MLflow

Trace feedback

The Workbench supports thumbs-up / thumbs-down feedback per turn. Feedback is submitted via:

POST /api/v1/traces/feedback

The feedback is attached to the corresponding MLflow trace, so GEPA optimization runs can use it as a label signal downstream.

WebSocket execution events

Two WebSocket endpoints power the live UI:

Endpoint	Stream
`/api/v1/ws/execution`	Chat stream events — user-facing turn-taking
`/api/v1/ws/execution/events`	Execution graph events — tool calls, sandbox steps, recursive delegation

Both require the same auth as HTTP endpoints when AUTH_REQUIRED=true.

Event shaping

Live UI events are shaped by src/fleet_rlm/api/events/events.py and src/fleet_rlm/runtime/execution/streaming_events.py. Each event is a JSON envelope:

event — the streamed event frame.
command_result — tool/command outcomes.
error — error envelopes.
Execution stream frames carry sandbox steps and recursive-delegation events.

The two endpoints serve different concerns:

Chat stream is bidirectional — clients send user messages and receive assistant frames.
Execution stream is read-only — clients subscribe to receive artifact and execution events as they arrive from the chat runtime. Useful for opening a second UI surface (e.g., a workspace canvas) on the same session.

Reverse-proxy requirements

Behind a reverse proxy:

Upgrade HTTP/1.1 connections.
Disable response buffering.
Forward the auth bearer token through.

Misconfigured proxies are the most common reason for “connected but no events” in production.

Runtime status and diagnostics

The runtime exposes structured health and connectivity probes.

Health

Endpoint	Auth	Purpose
`GET /health`	None	Liveness — `{"ok": true, "version": "..."}`
`GET /ready`	None	Readiness — composite component status

Sample readiness response:

{
  "ready": true,
  "planner_configured": true,
  "planner": "ready",
  "database": "ready",
  "database_required": true,
  "sandbox_provider": "daytona"
}

Runtime status

GET /api/v1/runtime/status returns a composite snapshot:

{
  "app_env": "local",
  "write_enabled": true,
  "ready": true,
  "sandbox_provider": "daytona",
  "active_models": {
    "planner": "openai/gpt-4o",
    "delegate": "openai/gpt-4o-mini",
    "delegate_small": ""
  },
  "llm": { "model_set": true, "api_key_set": true, "planner_configured": true },
  "mlflow": { "enabled": false, "startup_status": "pending", "startup_error": null },
  "daytona": { "api_key_set": true, "api_url_set": true, "target_set": true },
  "tests": {
    "lm": { "ok": true, "latency_ms": 850 },
    "daytona": { "ok": true, "latency_ms": 640 }
  },
  "guidance": []
}

Connectivity probes

Endpoint	Tests
`POST /api/v1/runtime/tests/lm`	Planner LM round-trip
`POST /api/v1/runtime/tests/daytona`	Daytona credentials and lifecycle

Both return the latency and a preflight envelope, and they cache results into the runtime status payload.

Daytona smoke

For sandbox-level validation without invoking an LM:

uv run fleet-rlm daytona-smoke \
  --repo https://github.com/Qredence/fleet-rlm.git \
  --ref main

This exercises credentials, network, sandbox lifecycle, repo clone, volume mount, and basic execution end-to-end.

PostHog (optional)

Variable	Default	Description
`POSTHOG_ENABLED`	`false`	Enable PostHog analytics
`POSTHOG_HOST`	`https://eu.i.posthog.com`	PostHog host

When enabled, the runtime emits product-analytics events through integrations/observability/posthog_callback.py. This is not a tracing replacement — it is for product usage signals.

Reading order

When you need to debug a misbehaving turn:

Open the MLflow trace for the turn — it has the full DSPy program execution.
If sandbox steps look wrong, check the execution-stream WebSocket frames for the same turn.
If startup looks wrong, hit /ready and /api/v1/runtime/status.
If Daytona is suspect, run uv run fleet-rlm daytona-smoke.

Fleet-RLM

Fleet Pi

Qredence Plugins

MLflow tracing

Local auto-start

What ends up in a trace

Wiring location

Trace feedback

WebSocket execution events

Event shaping

Reverse-proxy requirements

Runtime status and diagnostics

Health

Runtime status

Connectivity probes

Daytona smoke

PostHog (optional)

Reading order

See also

Fleet-RLM

Fleet Pi

Qredence Plugins

Documentation Index

​MLflow tracing

​Local auto-start

​What ends up in a trace

​Wiring location

​Trace feedback

​WebSocket execution events

​Event shaping

​Reverse-proxy requirements

​Runtime status and diagnostics

​Health

​Runtime status

​Connectivity probes

​Daytona smoke

​PostHog (optional)

​Reading order

​See also

MLflow tracing

Local auto-start

What ends up in a trace

Wiring location

Trace feedback

WebSocket execution events

Event shaping

Reverse-proxy requirements

Runtime status and diagnostics

Health

Runtime status

Connectivity probes

Daytona smoke

PostHog (optional)

Reading order

See also