Query-time embedding in josh-core (singleton + /embed endpoint)
Header
Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.
Why
Retrieval queries need to embed the user's query string before vec0
lookup, with sub-50ms latency. josh-core loads a singletonEmbeddingProvider at process startup so per-request embedding skips
model-load overhead. The /embed endpoint is a smoke surface — the
production retrieval API uses the same singleton internally but is a
separate spec (rest-api-search).
User stories
As an HTTP client, I want to POST a query string and get a vector back so that I can verify the deployment's encoder is loaded and working.
As a future retrieval endpoint, I want a singleton query-time provider so that per-request embedding doesn't pay model-load cost.
Acceptance criteria (EARS)
- When josh-core boots, the system shall NOT load the embedding model — load happens lazily on first `get_query_provider()` call.
- When `POST /embed` is called with a non-empty `text` field, the system shall return JSON containing `model_id`, `model_version`, `dim`, and a `vector` of `dim` floats.
- Where `text` is empty, the system shall return HTTP 422 (validation error).
- When the provider raises any `EmbeddingError`, the endpoint shall return HTTP 503 with the error message in the response body.
- Where `JOSH_QUERY_EMBED_PROVIDER` is anything other than `local`, the system shall raise `ProviderConfigError` (remote providers are not supported for query-time path).
Success determiner
Path
Runner
Clarifications needed
None.
Out of scope
None.
Dependencies
Plan
josh-core/josh_core/embedding.py exposes get_query_provider()
(lru_cache singleton over _build_query_provider).josh_core/routes/embed.py is the FastAPI router. Wired intojosh_core/main.py via two-line include. Tests inject a fake
provider by monkey-patching _build_query_provider before the cache
fires, since the route imports the cached function by name at module
load.
Tasks
4 of 4 done.
- t1 get_query_provider singleton with lru_cache + lazy build
- t2 POST /embed endpoint with Pydantic request/response models
- t3 Provider error → 503 mapping
- t4 Tests covering happy path, empty input, provider failure, existing endpoints survive
Changelog
-
2026-05-10T11:00:00Z
planned→verifiedSingleton + endpoint wired; 6 tests pass.