substrateverifiedp0

Query-time embedding in josh-core (singleton + /embed endpoint)

embedding-query-integration · updated — · owner rritz

Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.

Retrieval queries need to embed the user's query string before vec0
lookup, with sub-50ms latency. josh-core loads a singleton
EmbeddingProvider at process startup so per-request embedding skips
model-load overhead. The /embed endpoint is a smoke surface — the
production retrieval API uses the same singleton internally but is a
separate spec (rest-api-search).

As an HTTP client, I want to POST a query string and get a vector back so that I can verify the deployment's encoder is loaded and working.

As a future retrieval endpoint, I want a singleton query-time provider so that per-request embedding doesn't pay model-load cost.

  1. When josh-core boots, the system shall NOT load the embedding model — load happens lazily on first `get_query_provider()` call.
  2. When `POST /embed` is called with a non-empty `text` field, the system shall return JSON containing `model_id`, `model_version`, `dim`, and a `vector` of `dim` floats.
  3. Where `text` is empty, the system shall return HTTP 422 (validation error).
  4. When the provider raises any `EmbeddingError`, the endpoint shall return HTTP 503 with the error message in the response body.
  5. Where `JOSH_QUERY_EMBED_PROVIDER` is anything other than `local`, the system shall raise `ProviderConfigError` (remote providers are not supported for query-time path).
kindtest_file

Path

josh-core/tests/embedding/test_query_embedding.py

Runner

uv run pytest josh-core/tests/embedding -v

None.

None.

josh-core/josh_core/embedding.py exposes get_query_provider()
(lru_cache singleton over _build_query_provider).
josh_core/routes/embed.py is the FastAPI router. Wired into
josh_core/main.py via two-line include. Tests inject a fake
provider by monkey-patching _build_query_provider before the cache
fires, since the route imports the cached function by name at module
load.

4 of 4 done.

  • t1 get_query_provider singleton with lru_cache + lazy build
  • t2 POST /embed endpoint with Pydantic request/response models
  • t3 Provider error → 503 mapping
  • t4 Tests covering happy path, empty input, provider failure, existing endpoints survive
  • 2026-05-10T11:00:00Z plannedverified Singleton + endpoint wired; 6 tests pass.

docs/spec/embedding-query-integration.html · generated by bin/build-spec.py