substrateverifiedp1

https://docs.usejosh.com/operations/embedding-architecture/ — operator-facing reference

embedding-architecture-doc · updated — · owner rritz

Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.

Operators, contributors, and future agents need one canonical place
to read about the embedding pipeline: schema, protocol, worker
semantics, query-time path, model swap mechanics, and the full env
var matrix. Without this, the only source of truth is the code, which
scatters the architectural reasoning across 8 files and a migration.
The doc lives in the public docs/ tree alongside
ingestion-architecture.html so the pair tells the full ETL story.

As an operator tuning the worker, I want a single env var reference table per service so that I don't grep for getenv calls.

As a contributor extending the protocol, I want the failure-handling rules documented so that my new adapter wraps errors the way the worker expects.

  1. When a reader visits `/https://docs.usejosh.com/operations/embedding-architecture/`, the system shall display the four moving parts (schema, protocol, worker, query-time singleton), the failure-semantics table, the model-swap walkthrough, and the env var matrix for both worker and query-time paths.
  2. Where the doc references files, the references shall use repo-relative paths matching the actual layout (`josh-embedder/`, `shared/josh_substrate/embedding/`, etc.).
  3. When `bin/sync-nav.py` runs, the doc shall appear in the Operations sidebar across all docs/**/*.html peers.
  4. Where the doc describes vec0 storage, the description shall match the actual migration 0002 schema (`embedding float[1024]` + `embedding_bq bit[1024]`) and mention `vec_quantize_binary(?)` as the binary-companion serialization path.
kindbash

Command

set -euo pipefail
test -f https://docs.usejosh.com/operations/embedding-architecture/
grep -q '<title>Embedding architecture' https://docs.usejosh.com/operations/embedding-architecture/
# Sidebar propagation: at least one peer outside operations/ should
# reference the new doc after sync-nav has run.
grep -lq embedding-architecture.html docs/index.html
grep -lq embedding-architecture.html https://docs.usejosh.com/sources/crs-reports/
echo OK

Expect

OK

Verifies the file exists, has a sane title, and is reachable via the canonical sidebar.

None.

None.

Single HTML file at https://docs.usejosh.com/operations/embedding-architecture/,
matching the style of ingestion-architecture.html. Eight sections:
Three goals, Four moving parts, Schema, Protocol, Worker, Query path,
Model swaps, Configuration, End-to-end verification. bin/sync-nav.py
propagates the new entry into all peer sidebars.

4 of 4 done.

  • t1 Doc written matching ingestion-architecture's structure
  • t2 Added to canonical sidebar in docs/index.html
  • t3 Sidebar propagated to all peers via sync-nav
  • t4 Cross-referenced from CLAUDE.md's canonical-docs list
  • 2026-05-10T11:00:00Z plannedverified Doc published; sync-nav propagated to 80 peer pages.

docs/spec/embedding-architecture-doc.html · generated by bin/build-spec.py