LocalSTProvider — sentence-transformers in-process adapter

Header

Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.

Why

The default provider for two scenarios: (1) query-time embedding in
josh-core, where Modal cold-start (10–30s) is unacceptable for
interactive latency, and (2) bulk embedding when a deployment has no
GPU/remote backend and must fall back to CPU. Without LocalSTProvider,
Josh deployments require Modal credentials or a TEI server to do
anything; with it, the substrate is fully usable on a single VM.

User stories

As a small-deployment operator, I want an embedding provider that works without external services so that I can run Josh on a single $30/mo VM.

As a developer, I want query-time embedding under 50ms on CPU so that interactive search stays snappy without GPUs.

Acceptance criteria (EARS)

When a `LocalSTProvider` is constructed, the system shall NOT load the model — load happens lazily on first `_ensure_loaded` call.
When `embed_documents` is called on a fresh provider, the system shall load the model in `asyncio.to_thread` so the event loop is not blocked.
Where the model is in `_MODELS_WITH_QUERY_PROMPT`, `embed_query` shall pass `prompt_name='query'`; otherwise it falls back to plain encoding.
When `embed_documents` is called with input longer than the model's max sequence length, the system shall truncate (sentence-transformers default) rather than raise.
Where the publisher ships custom HF modeling code requiring `trust_remote_code=True` (e.g., Snowflake/snowflake-arctic-embed-m-v2.0; the production default Arctic-L uses standard XLM-Roberta and does NOT require the flag), the system shall accept the flag at construction and forward it.
When `health_check` is called, the system shall load the model if not already loaded and return `True`; on load failure, return `False` (do not raise).

Success determiner

kindtest_file

Path

shared/josh_substrate/tests/embedding/test_provider_contract.py

Runner

uv run pytest shared/josh_substrate/tests/embedding -m 'contract and not live' -k local_st

Clarifications needed

None.

Out of scope

None.

Dependencies

Plan

Single module local_st.py. SentenceTransformer import is local to
_ensure_loaded so the substrate's base import doesn't pull in torch
unconditionally. josh-substrate[local-st] extra carries the deps.
Lock for one-time model load is created lazily on first await to
avoid binding to a destroyed event loop in test contexts.

Tasks

5 of 5 done.

t1 LocalSTProvider class implementing the Protocol
t2 Lazy model load + asyncio bridge via to_thread
t3 Asymmetric query/document encoding for known prompt-aware models
t4 trust_remote_code support for Snowflake Arctic v2
t5 Contract suite passes against all-MiniLM-L6-v2 (test factory)

Changelog

2026-05-27T00:00:00Z verified→verified Acceptance criterion 5 clarified: the `trust_remote_code` example now notes that the production default (Arctic-L) does NOT require the flag — only models shipping custom HF modeling code (Arctic-M, etc.) do. No code change; documentation-only clarification after Phase 1 picked Arctic-L over Arctic-M.
2026-05-10T11:00:00Z planned→verified Adapter written; 12 contract tests pass against MiniLM in CI.