All integration guidesRecall

Low-latency warm recall

Run a local recall service that keeps the store, vector index, embedder, and reranker warm — so hook- and agent-loop calls stay under the 500ms budget.

Start the service

The service binds to 127.0.0.1 by default and is intended for local use. Keep bearer auth enabled whenever agent tools can reach localhost.

powershell
$env:HEARTWOOD_RECALL_TOKEN = "replace-with-local-secret"

python -m heartwood.cli serve-recall `
  --db .\heartwood.db `
  --tenant tenant:ops `
  --warm-tenant tenant:acme-payments `
  --warm-tenant tenant:northwind-retail `
  --host 127.0.0.1 --port 8765 `
  --token $env:HEARTWOOD_RECALL_TOKEN

Recall

Both embedded one-shot recall and warm-service recall return JSON with recall_id, latency_ms, index_lag, result metadata, provenance validation, ranking signals, and source IDs.

powershell
python -m heartwood.cli recall `
  --url http://127.0.0.1:8765 `
  --token $env:HEARTWOOD_RECALL_TOKEN `
  --tenant tenant:acme-payments `
  --principal-id agent:orchestrator `
  --query "what guidance applies to Acme Payments audit details?" `
  --k 5 --json

Prove the 500ms budget

Run the benchmark against the warm service before cutting over any latency-sensitive caller. It reports p50, p95, max latency, and pass status.

powershell
python -m heartwood.cli bench-recall `
  --url http://127.0.0.1:8765 `
  --token $env:HEARTWOOD_RECALL_TOKEN `
  --tenant tenant:acme-payments `
  --principal-id agent:orchestrator `
  --query "Acme Payments audit provenance guidance" `
  --repeat 10 --max-p95-ms 500 --require-pass

HTTP surface

• GET /health — readiness and model/index names • POST /recall — governed recall (send Authorization: Bearer <token> when a token is set) • GET /metrics — process-local recall latency counters and p95 • POST /warm — warm additional tenants

Adapted from docs/integrations/warm-recall.md in the open-source core.