This page is operational reference material. New users should start with the Introduction and Quickstart.Documentation Index
Fetch the complete documentation index at: https://docs.qredence.ai/llms.txt
Use this file to discover all available pages before exploring further.
Incident response
IR-1: Bedrock API outage (circuit breaker open)
Trigger: Users report chat returning “Bedrock API is temporarily unavailable” or all/api/chat requests fail with 500 errors.
- Verify the circuit breaker state
- Check application logs for
bedrock-apicircuit breaker events - Look for
openstate transitions in logs withrequestIdcorrelation - Run
curl -sf http://localhost:3000/api/healthto confirm the web server is still healthy
- Check application logs for
- Check Bedrock service status
- Verify AWS credentials are valid:
aws sts get-caller-identity - Check Bedrock model access in the AWS Console for the configured region (default
us-east-1) - Review AWS Service Health Dashboard for regional outages
- Verify AWS credentials are valid:
- Inspect recent error patterns
- Search logs for the last 30 minutes:
grep "bedrock-api"orgrep "circuit breaker" - Identify if errors are throttling (429), auth (403), or model-level (400)
- Note the
errorThresholdPercentage(50%) andvolumeThreshold(5) — the breaker opens after 3 failures within 5 calls
- Search logs for the last 30 minutes:
- Wait for automatic recovery or force reset
- The circuit breaker
resetTimeoutis 30 seconds; it will attempt a half-open call after that period - If Bedrock is confirmed restored but the breaker is still open, restart the dev server to reset the breaker state
- The circuit breaker
- Communicate
- Post in the incident channel: “Bedrock circuit breaker open — root cause under investigation”
- If AWS is at fault, set status page to “degraded” and estimate recovery based on AWS status updates
IR-2: Chat session corruption or data loss
Trigger: Users refresh the page and see an empty transcript, or the chat UI shows “Session reset” repeatedly.- Identify the affected session
- Extract
sessionIdfrom browserlocalStorageor from thestartevent in recent/api/chatrequest logs - Locate the Pi session file path under
.fleet/sessions/inside the repo root
- Extract
- Check session file validity
- Verify the session JSONL file exists and is readable
- Ensure the file is inside the repo-scoped session directory (outside files are rejected by
isUsableSessionFile) - Look for truncated or malformed JSONL lines at the end of the file
- Validate localStorage metadata
- If
localStoragecontains an invalidsessionFile(e.g. pointing to/etc/hostsor a non-existent path), the app silently starts a fresh repo-scoped session — this is expected behavior - Instruct the user to clear
localStoragefor the site if the stored metadata is corrupt
- If
- Attempt manual hydration
- Call
POST /api/chat/sessionwith thesessionIdto triggerhydrateChatSession - If the session file cannot be opened, the server returns an empty message list with
sessionReset: true
- Call
- Recover or recreate
- If the file is corrupt beyond repair, archive it and let the user start a new session
- If the issue is widespread, check disk space and file system permissions on
.fleet/sessions/
- Follow up
- Document the root cause (disk full, permission issue, or Pi SDK bug)
- Monitor
SessionManager.openerror rates for 24 hours
Troubleshooting
Bedrock errors
Symptoms: Chat streams terminate witherror events, model picker shows unavailable models, or diagnostics contain model registry errors.
- ThrottlingException (429) — Bedrock is rate-limiting requests.
- Check the
requestIdin logs to confirm it is the same across retries - The Pi SDK auto-retries with exponential backoff; do not manually retry
- If sustained, enable request batching or switch to a lower-traffic model variant
- Check the
- AccessDeniedException (403) — IAM role or profile lacks
bedrock:*permissions.- Verify
AWS_PROFILEandAWS_BEARER_TOKEN_BEDROCKenvironment variables - Ensure the IAM policy includes
bedrock:InvokeModelandbedrock:InvokeModelWithResponseStream
- Verify
- ValidationException (400) — The requested model ID is invalid.
- Check
modelSelectionin the request body against the registry - Model IDs use region prefixes (e.g.
us.anthropic.claude-sonnet-4-6); the backend normalizes candidates but a completely unknown ID will fail
- Check
- ModelNotReadyException — The model is not enabled in the AWS account.
- Visit the Bedrock Console > Model access and enable the model for the current region
- Network / timeout errors — The circuit breaker
timeoutis 30 seconds.- If Bedrock does not respond within 30 seconds, the breaker counts it as a failure
- Check VPC endpoints or corporate proxy settings if running in a restricted network
Session hydration failures
Symptoms: After refreshing the browser, prior messages are gone; the UI shows a blank chat;sessionReset: true appears in /api/chat responses.
- Invalid
sessionFilein localStorage — The browser stores only Pi session metadata (sessionFileandsessionId). IfsessionFilepoints outside the repo session directory,isUsableSessionFilereturnsfalseand a fresh repo-scoped session is created silently.- Remediation: Clear site
localStorageand start a new chat
- Remediation: Clear site
- Missing or moved session file — The session JSONL was deleted or moved after the metadata was stored.
- Remediation: Check
.fleet/sessions/for the file; if missing, the session is unrecoverable
- Remediation: Check
- Corrupt session JSONL — A malformed line causes
SessionManager.opento throw.- Remediation: Inspect the file with
head -n 20andtail -n 5; remove trailing partial lines if safe, otherwise archive and start fresh
- Remediation: Inspect the file with
- Race condition during streaming — If a page refresh happens while the session is being compacted, the file may be in an inconsistent state.
- Remediation: Wait 5 seconds and retry hydration; the compaction lock should release
Circuit breaker states
The Bedrock API call is wrapped byopossum with the following configuration:
| Option | Value | Meaning |
|---|---|---|
errorThresholdPercentage | 50% | Open after half of sampled calls fail |
resetTimeout | 30,000 ms | Wait 30 s before trying half-open |
volumeThreshold | 5 | Minimum 5 calls before breaker can open |
timeout | 30,000 ms | Each call must complete within 30 s |
- Closed (normal) — Requests flow to Bedrock. Failures are counted.
- Open — All calls are rejected immediately with the fallback error:
"Bedrock API is temporarily unavailable due to repeated failures. Please try again later." - Half-open — The next call is allowed through as a probe. If the probe succeeds, the breaker closes. If it fails, the breaker opens again for another
resetTimeout.
Quick reference
| Command | Purpose |
|---|---|
curl -sf http://localhost:3000/api/health | Verify web server health |
aws sts get-caller-identity | Verify AWS credentials |
pnpm --filter web test | Run unit tests (including circuit breaker tests) |
pnpm lint | Check code quality |
pnpm knip | Detect unused code |
Related files
apps/web/src/lib/pi/circuit-breaker.ts— Breaker configuration and factoryapps/web/src/lib/pi/server.ts— Bedrock invocation and session runtimeapps/web/src/lib/pii/sanitizer.ts— Input redaction before loggingapps/web/src/lib/logger.ts— Pino logger with redaction andrequestIdcorrelation