substrateverifiedp0

EmbeddingProvider protocol + uniform error hierarchy

embedding-provider-protocol · updated — · owner rritz

Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.

Worker, query-time, and adapter code all need a single shape to talk
about embeddings. The Protocol (in
shared/josh_substrate/embedding/protocol.py) defines three callables
and four required attributes; the error hierarchy (in errors.py)
gives the worker uniform retry/terminate semantics regardless of which
backend failed. Without this layer every adapter is an island.

As an embedding-worker author, I want one protocol every adapter satisfies so that switching providers is a config flag, not a code change.

As an adapter author, I want a typed contract enforced by mypy so that drift is caught at PR time, not in production.

  1. Where an adapter implements `EmbeddingProvider`, it shall expose `model_id`, `model_version`, `dim`, `max_batch` as attributes and `embed_documents`, `embed_query`, `health_check` as async methods.
  2. When a backend call fails, the adapter shall raise a subclass of `EmbeddingError` (`ProviderConfigError`, `ProviderTransientError`, or `ProviderUnavailableError`); raw backend exceptions shall not cross the provider boundary.
  3. When `embed_documents` is called with `[]`, the adapter shall return `EmbeddingResult(vectors=[], ...)` without contacting the backend.
  4. When `embed_documents` returns, the result's `model_id` and `model_version` shall match `provider.model_id`/`model_version`.
kindtest_file

Path

shared/josh_substrate/tests/embedding/test_provider_contract.py

Runner

uv run pytest shared/josh_substrate/tests/embedding -m 'contract and not live'

None.

None.

Pure-typing layer. protocol.py defines EmbeddingProvider (Protocol,
runtime_checkable) and EmbeddingResult (dataclass). errors.py is a
4-class hierarchy with a shared base. Both export from
josh_substrate.embedding. New code in this package opts into strict
mypy via [[tool.mypy.overrides]].

4 of 4 done.

  • t1 Protocol + EmbeddingResult + Embedding type alias
  • t2 EmbeddingError + 3 subclasses with documented retry semantics
  • t3 Strict-mypy override for josh_substrate.embedding.*
  • t4 Public surface re-exported from package __init__
  • 2026-05-10T11:00:00Z plannedverified Protocol + errors written; mypy strict checks pass on the package.

docs/spec/embedding-provider-protocol.html · generated by bin/build-spec.py