substrateverifiedp1

HttpProvider — generic OpenAI-/TEI-compatible HTTP adapter

embedding-provider-http · updated — · owner rritz

Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.

Covers OpenAI's /v1/embeddings, HuggingFace TEI, vLLM with
--task embed, and any self-hosted server speaking either format.
This is the escape hatch for deployments that already pay for an
embedding API or run their own embedding service: no need to touch
the substrate, just point JOSH_HTTP_EMBED_URL at it.

As an OpenAI-API customer, I want to use the embedding API I already pay for so that I don't run a parallel Modal account just for Josh.

As a self-hosted-TEI operator, I want to point Josh at my internal embedding server so that my data never leaves my network.

  1. When `JOSH_HTTP_FORMAT=openai`, the adapter shall send `{"input": [...], "model": ...}` and parse `{"data": [{"embedding": [...]}, ...]}`.
  2. When `JOSH_HTTP_FORMAT=tei`, the adapter shall send `{"inputs": [...]}` and parse a bare list of lists.
  3. When the endpoint returns 401/403, the adapter shall raise `ProviderConfigError`; 429/5xx → `ProviderTransientError`; network failure → `ProviderUnavailableError`.
  4. Where `JOSH_HTTP_API_KEY` is set, the adapter shall send it as `Authorization: Bearer <key>`; otherwise no auth header.
  5. When the response shape is unparseable for the configured format, the adapter shall raise `ProviderTransientError` with the format name and parse-failure detail.
kindtest_file

Path

shared/josh_substrate/tests/embedding/test_provider_contract.py

Runner

JOSH_HTTP_EMBED_URL=http://localhost:8080/embed uv run pytest shared/josh_substrate/tests/embedding -m 'contract' -k http

Skipped by default. Set JOSH_HTTP_EMBED_URL to a reachable OpenAI-compatible or TEI endpoint to run live. Default factory pytest.skip preserves the contract that adapters degrade gracefully in unconfigured envs.

None.

None.

shared/josh_substrate/embedding/providers/http_provider.py. httpx
in josh-substrate[http] extra. Two payload/parse functions
(_parse_openai, _parse_tei) selected at construction. Vector
shape verified after each response.

4 of 4 done.

  • t1 HttpProvider class implementing the Protocol
  • t2 openai + tei format support, env-driven selection
  • t3 Status-code mapping to the three EmbeddingError flavours
  • t4 health_check via HEAD on the embed URL
  • 2026-05-10T11:00:00Z plannedverified Adapter shipped; live test gated on JOSH_HTTP_EMBED_URL env.

docs/spec/embedding-provider-http.html · generated by bin/build-spec.py