https://docs.usejosh.com/operations/embedding-architecture/ — operator-facing reference
Header
Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.
Why
Operators, contributors, and future agents need one canonical place
to read about the embedding pipeline: schema, protocol, worker
semantics, query-time path, model swap mechanics, and the full env
var matrix. Without this, the only source of truth is the code, which
scatters the architectural reasoning across 8 files and a migration.
The doc lives in the public docs/ tree alongsideingestion-architecture.html so the pair tells the full ETL story.
User stories
As an operator tuning the worker, I want a single env var reference table per service so that I don't grep for getenv calls.
As a contributor extending the protocol, I want the failure-handling rules documented so that my new adapter wraps errors the way the worker expects.
Acceptance criteria (EARS)
- When a reader visits `/https://docs.usejosh.com/operations/embedding-architecture/`, the system shall display the four moving parts (schema, protocol, worker, query-time singleton), the failure-semantics table, the model-swap walkthrough, and the env var matrix for both worker and query-time paths.
- Where the doc references files, the references shall use repo-relative paths matching the actual layout (`josh-embedder/`, `shared/josh_substrate/embedding/`, etc.).
- When `bin/sync-nav.py` runs, the doc shall appear in the Operations sidebar across all docs/**/*.html peers.
- Where the doc describes vec0 storage, the description shall match the actual migration 0002 schema (`embedding float[1024]` + `embedding_bq bit[1024]`) and mention `vec_quantize_binary(?)` as the binary-companion serialization path.
Success determiner
Command
set -euo pipefail
test -f https://docs.usejosh.com/operations/embedding-architecture/
grep -q '<title>Embedding architecture' https://docs.usejosh.com/operations/embedding-architecture/
# Sidebar propagation: at least one peer outside operations/ should
# reference the new doc after sync-nav has run.
grep -lq embedding-architecture.html docs/index.html
grep -lq embedding-architecture.html https://docs.usejosh.com/sources/crs-reports/
echo OK
Expect
Verifies the file exists, has a sane title, and is reachable via the canonical sidebar.
Clarifications needed
None.
Out of scope
None.
Dependencies
Plan
Single HTML file at https://docs.usejosh.com/operations/embedding-architecture/,
matching the style of ingestion-architecture.html. Eight sections:
Three goals, Four moving parts, Schema, Protocol, Worker, Query path,
Model swaps, Configuration, End-to-end verification. bin/sync-nav.py
propagates the new entry into all peer sidebars.
Tasks
4 of 4 done.
- t1 Doc written matching ingestion-architecture's structure
- t2 Added to canonical sidebar in docs/index.html
- t3 Sidebar propagated to all peers via sync-nav
- t4 Cross-referenced from CLAUDE.md's canonical-docs list
Changelog
-
2026-05-10T11:00:00Z
planned→verifiedDoc published; sync-nav propagated to 80 peer pages.