substrateverifiedp1

Substrate determiner verification

substrate-determiner-verification · updated 2026-05-29T12:00:00Z · owner rritz

Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.

Make a success_determiner's green/red signal trustworthy enough that a
scheduled, self-prompting routine can act on it without a human in the loop.
The unlock is verification, not autonomy: a determiner is only trustworthy if
it provably goes RED when the thing it checks actually breaks. This spec
governs the layer that proves that — a target-agnostic probe: contract, a
single substrate execution shim, adversarial mutation suites, and a CI gate
(poe verify-determiner-suite) that fails the push if any determiner is
gameable.

As a scheduled routine acting on substrate health, I want a determiner whose green means "verified" and whose red means "broken" so that I can act on the signal without a human re-checking it.

As a spec author, I want an adversarial mutation suite that proves my determiner isn't gameable so that marking a spec routine_eligible is earned, not asserted.

As an operator, I want the same probe to run against a local snapshot in CI and the live substrate so that what CI proves is what production enforces.

  1. When a mutation in a spec's `docs/spec/mutations/<id>.yaml` suite is marked `expect: fail`, `poe verify-determiner-suite` shall report it RED; a determiner that stays green on such a mutation is GAMEABLE and shall fail the suite.
  2. Where a determiner is sql-kind, its `query` + `compare` shall be treated as a single implicit probe; where bash-kind, its explicit `probe:` list shall be the machine-checkable contract that `bin/test-determiners.py` exercises.
  3. When `JOSH_SUBSTRATE_TARGET` is unset and no local substrate DB exists, a determiner run shall exit 77 (CNV) — neither green nor red — so a routine does not act on an unverifiable result.
  4. Where a spec is marked `routine_eligible: true`, `spec-lint --strict` shall fail unless its determiner carries a machine assertion (sql `compare:` or bash `probe:`) and a mutation suite exists at `docs/spec/mutations/<id>.yaml`.
  5. When `poe verify-determiner-suite` runs, it shall provision one fresh `alembic upgrade head` snapshot, snapshot-and-restore per mutation, skip `mode: remote-only` / `mode: manual` mutations in local mode, treat a failed `apply` as a hard error, and complete within the local CI time budget.
kindbash

Command

uv run poe verify-determiner-suite

Expect

Exit 0. Every locally-runnable mutation suite's baseline probe passes, every `expect: fail` mutation drives its determiner RED (caught), no GAMEABLE shapes, no apply ERRORs. Currently 4 suites run locally (crs-reports, legislators-and-committees, substrate-cron-scheduler, substrate-vector-pipeline); roll-call / SAP / nightly-backup are skipped pending remote-mode probe extraction.

Running this determiner runs the whole local mutation suite (~3s: one `alembic upgrade head` + cheap snapshot copies). It needs no server — local mode provisions throwaway temp SQLite. Remote mode (`JOSH_SUBSTRATE_TARGET=remote`) is for the loaded substrate and, per CLAUDE.md, runs serially and is never fanned across parallel agents.

None.

  • The routines layer itself (the scheduled self-prompting that consumes these signals) — this spec only makes the signal trustworthy.
  • `bin/spec-pickup.py` and the spec-system tier/roadmap work (substrate-spec-system-living).
  • Remote-mode probe extraction for the bash-kind ingest determiners (roll-call, SAP) and the irreducibly-remote nightly-backup restic ops — deferred until the substrate is loaded on the bare-metal host.

The unifying primitive is a probe: a {query, compare, label} triple —
SQL plus an assertion — that is intrinsically target-agnostic.

1. bin/_substrate.py — the single point where JOSH_SUBSTRATE_TARGET
resolves to an executor: local (candidate files), local:/abs.db (an
explicit snapshot, used per-mutation), or remote (ssh into josh and run
sqlite3 in the josh-core container, output byte-capped). Owns CNV_EXIT,
the moved compare helpers (_compare_op / _evaluate_compare), run_sql
(loads sqlite-vec so vec0 DDL works through the plain CLI), probes_for,
run_probe_list, and provision_template (a fresh alembic upgrade head
usable outside pytest).

2. probe: on the bash determiner branch of _schema.json. sql-kind
determiners auto-promote their query + compare to a single probe (no
re-churn). Probe queries must land their assertion on the first output line
(the runner reads row[0]); multi-line schema text is boolean-ized with
SELECT (sql LIKE '%x%' AND sql LIKE '%y%') FROM sqlite_master WHERE name='t'.

3. bin/test-determiners.py — provisions a known-good baseline (fresh
alembic, plus an optional seed: block for data-count determiners),
confirms the baseline probe passes (a red baseline is a seed/probe bug,
reported distinctly), then applies each mutation's bare SQL to a snapshot
and asserts the result matches expect:. An expect: fail that stays green
is GAMEABLE (hard fail); a failed apply is an ERROR (hard fail).

4. routine_eligible: true — structurally gated by spec-lint --strict
(_routine_eligible_checks: machine assertion + mutation file present) and
behaviorally gated by poe verify-determiner-suite in the ci sequence.

The verification win is concrete: deepening substrate-vector-pipeline's probe
from table-existence to column-shape (float[1024] + bit[1024]) closed two
gaps its suite had documented (wrong-embedding-dim, missing-bq-companion)
— both now caught.

6 of 7 done.

  • t1 bin/_substrate.py shim: target resolution + run_sql (sqlite-vec) + probes + CNV_EXIT + provision_template
  • t2 probe: contract in _schema.json (bash branch); dry-run-spec rewired through the shim + --probe
  • t3 bin/test-determiners.py: baseline-verify + per-mutation snapshot/apply/probe + gameable/error reporting
  • t4 substrate-vector-pipeline 3-probe (proven: 5 caught, 2 gaps closed) + negative-control verified
  • t5 seed-backed suites for crs-reports / legislators-and-committees / substrate-cron-scheduler (18 caught, 8 gaps)
  • t6 routine_eligible: schema field + _routine_eligible_checks (spec-lint) + verify-determiner-suite in ci
  • t7 Remote mode: extract probes for roll-call / SAP ingest determiners; nightly-backup stays remote-only (deferred until substrate loaded)
  • 2026-05-29T12:00:00Z (new)verified Built the verification layer and ran its determiner green: `poe verify-determiner-suite` exits 0 over 4 local suites (18 expect:fail mutations caught, 8 documented gaps tolerated, 0 gameable, 0 errors), and a negative-control mutation confirmed the harness reports GAMEABLE + exits 1 when a probe can't see a break. Remote-mode probe extraction for the bash-ingest determiners (t7) is the remaining work, deferred until the substrate is loaded on the bare-metal host.

docs/spec/substrate-determiner-verification.html · generated by bin/build-spec.py