> ## Documentation Index
> Fetch the complete documentation index at: https://docs.qredence.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Observability: tracing, events, and diagnostics

> Trace fleet-rlm with MLflow DSPy spans, subscribe to WebSocket execution events, and inspect runtime diagnostics and PostHog analytics.

`fleet-rlm` exposes four overlapping observability surfaces. Pick the one matched to what you need to see.

| Surface                      | What it shows                                        | When to use                                                  |
| ---------------------------- | ---------------------------------------------------- | ------------------------------------------------------------ |
| MLflow tracing               | DSPy program traces — spans, prompts, model calls    | Debugging program logic, GEPA evaluation, prompt regressions |
| WebSocket execution events   | Live tool calls, sandbox steps, recursive delegation | Live UI, real-time UX                                        |
| Runtime status / diagnostics | LM + Daytona connectivity, readiness                 | Probing health, smoke validation                             |
| PostHog (optional)           | Product analytics                                    | Product usage telemetry                                      |

## MLflow tracing

When `MLFLOW_ENABLED=true` (default), every DSPy call emits a trace to `MLFLOW_TRACKING_URI` under the `MLFLOW_EXPERIMENT` experiment (default: `fleet-rlm`).

### Local auto-start

In `APP_ENV=local`, the API server auto-starts a localhost MLflow target on port 5001 unless `MLFLOW_AUTO_START=false`. To run it explicitly:

```bash theme={null}
make mlflow-server
```

This keeps `MLFLOW_TRACKING_URI=http://127.0.0.1:5001` aligned with the `make mlflow-server` target.

### What ends up in a trace

Each user turn produces one MLflow trace covering:

* The top-level `FleetAgent.respond(...)` call.
* All ReAct iterations — thought, action, observation.
* Every tool invocation, including `delegate_to_rlm`.
* Recursive child RLM runs as nested spans.
* All `llm_query` / `sub_rlm` callbacks from inside child sandboxes.

Because the host-callback bridge is the only path from sandbox to host, **trace continuity is preserved across recursion** — a child RLM's calls show up as nested spans under their parent, not as orphan traces.

### Wiring location

| File                                           | Role                                                  |
| ---------------------------------------------- | ----------------------------------------------------- |
| `integrations/observability/callbacks.py`      | Centralized DSPy callback registry (MLflow + PostHog) |
| `integrations/observability/mlflow_runtime.py` | MLflow client + tracing setup                         |
| `integrations/observability/mlflow_traces.py`  | Trace context propagation                             |
| `integrations/observability/trace_context.py`  | Cross-thread/async trace context                      |
| `runtime/quality/mlflow_evaluation.py`         | DSPy evaluation under MLflow                          |
| `runtime/quality/mlflow_optimization.py`       | GEPA / MIPROv2 optimization runs under MLflow         |

<Note>
  Since `0.5.50`, MLflow and PostHog callbacks are installed through a single deduplicating registry. Observability startup remains lazy, and callbacks stay visible to worker-thread DSPy contexts when the global `dspy.configure` refuses non-owner threads or async tasks.
</Note>

### MLflow startup status

`/api/v1/runtime/status` reports an `mlflow.startup_status` field separate from the top-level `ready` flag (which also requires LM and Daytona checks). Use it to distinguish "MLflow is broken" from "everything else is broken":

| `startup_status` | Meaning                                                                                                                                         |
| ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| `ready`          | MLflow client initialized and `MLFLOW_TRACKING_URI` is reachable.                                                                               |
| `degraded`       | MLflow is enabled but startup failed — version mismatch, unreachable URI, or init error. Check `mlflow.startup_error` for the remediation hint. |
| `disabled`       | `MLFLOW_ENABLED=false`.                                                                                                                         |
| `pending`        | Background MLflow warmup has not finished yet.                                                                                                  |

If the local UI reports `Cannot query field 'effectiveTraceArchivalRetention'`, an older `mlflow server` process is still listening on the configured port. Stop the stale server, run `make mlflow-upgrade`, then restart with `make mlflow-server`. Chat turns call `initialize_mlflow()` on entry so tracing can recover on the first turn after the server is restarted, even if background startup is still racing.

### Session trace listing

`GET /api/v1/sessions/{session_id}/traces` returns paginated MLflow traces linked to a session, including child delegations spawned by recursive RLM turns. The Workbench uses this to render the trace strip alongside the conversation so reviewers can jump straight from a message to its full DSPy trace tree.

### Trace feedback

The Workbench supports thumbs-up / thumbs-down feedback per turn. Feedback is submitted via:

```http theme={null}
POST /api/v1/traces/feedback
```

The feedback is attached to the corresponding MLflow trace, so GEPA optimization runs can use it as a label signal downstream.

## WebSocket execution events

Two WebSocket endpoints power the live UI:

| Endpoint                      | Stream                                                                   |
| ----------------------------- | ------------------------------------------------------------------------ |
| `/api/v1/ws/execution`        | Chat stream events — user-facing turn-taking                             |
| `/api/v1/ws/execution/events` | Execution graph events — tool calls, sandbox steps, recursive delegation |

Both require the same auth as HTTP endpoints when `AUTH_REQUIRED=true`.

### Event shaping

Live UI events are shaped by `src/fleet_rlm/api/events/events.py` and `src/fleet_rlm/runtime/execution/streaming_events.py`. Since `0.5.50`, direct, tool-using, and RLM turns share a single `dspy.streamify` replay path, so every turn flows through one websocket event pipeline regardless of routing. Each event is a JSON envelope:

* `event` — the streamed event frame.
* `command_result` — tool/command outcomes.
* `error` — error envelopes.
* Execution stream frames carry sandbox steps and recursive-delegation events.

The two endpoints serve different concerns:

* **Chat stream** is bidirectional — clients send user messages and receive assistant frames.
* **Execution stream** is read-only — clients subscribe to receive artifact and execution events as they arrive from the chat runtime. Useful for opening a second UI surface (e.g., a workspace canvas) on the same session.

### Decoupled delivery

Turn execution runs in a background task with its own agent context, independent of the WebSocket that submitted the message. All events flow through a shared `ExecutionEventEmitter`, which fans out to every subscriber registered for the same `(workspace_id, user_id, session_id)` tuple.

Two consequences for clients:

* **Reconnect-safe turns.** If your WebSocket drops mid-turn, the turn keeps running. Re-subscribe with the same identity tuple to resume receiving frames.
* **Multiple viewers per session.** A second client (for example, a workspace canvas alongside the chat panel) can attach to `/api/v1/ws/execution/events` and observe the same stream without affecting the primary chat connection.

Frame shapes are unchanged — existing clients require no updates.

### Reverse-proxy requirements

Behind a reverse proxy:

* Upgrade HTTP/1.1 connections.
* Disable response buffering.
* Forward the auth bearer token through.

Misconfigured proxies are the most common reason for "connected but no events" in production.

## Runtime status and diagnostics

The runtime exposes structured health and connectivity probes.

### Health

| Endpoint      | Auth | Purpose                                     |
| ------------- | ---- | ------------------------------------------- |
| `GET /health` | None | Liveness — `{"ok": true, "version": "..."}` |
| `GET /ready`  | None | Readiness — composite component status      |

Sample readiness response:

```json theme={null}
{
  "ready": true,
  "planner_configured": true,
  "planner": "ready",
  "database": "ready",
  "database_required": true,
  "sandbox_provider": "daytona"
}
```

### Runtime status

`GET /api/v1/runtime/status` returns a composite snapshot:

```json theme={null}
{
  "app_env": "local",
  "write_enabled": true,
  "ready": true,
  "sandbox_provider": "daytona",
  "active_models": {
    "planner": "openai/gpt-4o",
    "delegate": "openai/gpt-4o-mini",
    "delegate_small": ""
  },
  "llm": { "model_set": true, "api_key_set": true, "planner_configured": true },
  "mlflow": { "enabled": true, "startup_status": "ready", "startup_error": null },
  "daytona": { "api_key_set": true, "api_url_set": true, "target_set": true },
  "tests": {
    "lm": { "ok": true, "latency_ms": 850 },
    "daytona": { "ok": true, "latency_ms": 640 }
  },
  "guidance": []
}
```

### Connectivity probes

| Endpoint                             | Tests                             |
| ------------------------------------ | --------------------------------- |
| `POST /api/v1/runtime/tests/lm`      | Planner LM round-trip             |
| `POST /api/v1/runtime/tests/daytona` | Daytona credentials and lifecycle |

Both return the latency and a preflight envelope, and they cache results into the runtime status payload.

### Daytona smoke

For sandbox-level validation without invoking an LM:

```bash theme={null}
uv run fleet-rlm daytona-smoke \
  --repo https://github.com/Qredence/fleet-rlm.git \
  --ref main
```

This exercises credentials, network, sandbox lifecycle, repo clone, volume mount, and basic execution end-to-end.

## PostHog (optional)

| Variable          | Default                    | Description              |
| ----------------- | -------------------------- | ------------------------ |
| `POSTHOG_ENABLED` | `false`                    | Enable PostHog analytics |
| `POSTHOG_HOST`    | `https://eu.i.posthog.com` | PostHog host             |

When enabled, the runtime emits product-analytics events through `integrations/observability/posthog_callback.py`. This is **not** a tracing replacement — it is for product usage signals.

## Reading order

When you need to debug a misbehaving turn:

1. Open the MLflow trace for the turn — it has the full DSPy program execution.
2. If sandbox steps look wrong, check the execution-stream WebSocket frames for the same turn.
3. If startup looks wrong, hit `/ready` and `/api/v1/runtime/status`.
4. If Daytona is suspect, run `uv run fleet-rlm daytona-smoke`.

## See also

* [HTTP & WebSocket API](/fleet-rlm/reference/http-api) — full endpoint surface.
* [Configuration](/fleet-rlm/reference/configuration) — MLflow and PostHog env vars.
* [Troubleshooting](/fleet-rlm/guides/troubleshooting) — common failure shapes.
