REST API — resource endpoints
Header
Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.
Why
Per-source resource fetches: GET /v1/bills/{bill_id},GET /v1/legislators/{bioguide_id}, GET /v1/committees/{committee_id},
and friends. This is the deterministic half of the API — given a
known identifier, return the full record + citation, never via FTS5
or vector search. Without this surface, identifier-bearing questions
("Does HR103 mention child protection?", "Who is on Ways and Means?")
fall through into the search path, where dense embeddings reliably
miss exact identifiers (Harvey, 2025). The substrate is already
relational; this spec exposes that relational shape directly.
This endpoint family is also the backend for mcp-server's typed Class A
tools (get_bill, get_legislator, list_committee_members, etc.) and
for its universal fetch(id) dispatcher. The ID grammar fromrest-api-conventions §5 is the routing key — every public ID has a
unique type prefix, so one MCP fetch tool can dispatch any ID to the
right handler here without ambiguity.
User stories
As an agent assembling a citation-grounded answer to "Did HR103 pass the House?", I want to call `GET /v1/bills/hr:119:103` and get back the bill's status, sponsor, last action, and citation so that I answer from structured data, not by guessing from semantic-search snippets.
As an agent asked "Who is on the Ways and Means Committee in the 119th Congress?", I want to call `GET /v1/committees/HSWM/members?congress=119` and receive a list of legislators so that I never run a vector search for a question whose answer is a SQL join.
As an MCP server implementer, I want every public ID issued by the substrate (search hits, ingester output, citations) to be a valid input to a single `GET /v1/<resource>/{id}` route so that `mcp-server`'s `fetch` tool is a one-line prefix-dispatch.
As an OSS self-hoster, I want deterministic curl-able endpoints with the same envelope as search so that I can audit the substrate's record-by-record state without learning a query DSL.
Acceptance criteria (EARS)
- When `GET /v1/bills` is called with any of `?congress=`, `?type=` (hr|s|hjres|sjres|hres|sres|hconres|sconres, multi-value comma-separated), `?status=`, `?sponsor_bioguide=`, `?sponsor_party=` (D|R|I), `?sponsor_state=` (USPS code), `?sponsor_chamber=` (house|senate), `?referred_committee=`, `?since=`, `?until=`, the system shall apply these as structured filters; multi-value filters AND across fields and OR within a field's comma-separated list.
- When `GET /v1/legislators` is called with any of `?party=`, `?chamber=`, `?state=`, `?district=`, `?congress=`, `?is_current=`, the system shall filter accordingly; `congress=` resolves via `legislator_terms` (member served in that congress).
- When `GET /v1/committees` is called with `?chamber=` or `?congress=`, the system shall filter accordingly; `congress=` restricts to committees that existed in that congress.
- When `GET /v1/federal-register` is called with `?agency=` (multi-value comma-separated), `?doc_type=` (notice|rule|proposed_rule|presidential_document), `?since=`, `?until=`, the system shall filter accordingly.
- When `GET /v1/hearings` is called with `?committee=` (committee_id), `?chamber=`, `?congress=`, `?since=`, `?until=`, the system shall filter accordingly.
- When `GET /v1/roll-call-votes` is called with `?chamber=`, `?congress=`, `?question_type=`, `?since=`, `?until=`, the system shall filter accordingly.
- When `GET /v1/lda-filings` is called with `?registrant_name=`, `?client_name=`, `?issue=`, `?year=`, `?quarter=`, `?min_spend=`, `?max_spend=`, the system shall filter accordingly.
- Where a filter param is passed with a value outside its documented allow-list (e.g., `?sponsor_party=Z`), the system shall return HTTP 400 with `error.code='invalid_filter_value'` and `hint.valid_values`.
- Where a filter param name is not in the documented per-resource set, the system shall return HTTP 400 with `error.code='unknown_filter'` and `hint.valid_filters` (full list for that resource).
- When a list endpoint is called without filters, it shall return cursor-paginated results sorted by `-<primary_time_field>` (per `rest-api-conventions` §8).
- When a client requests `GET /v1/bills/{bill_id}`, the system shall return the full bill record (per `rest-api-conventions` `full` fieldset) plus a `citation` block; 404 with `error.code='record_not_found'` if the ID does not exist.
- When a client requests `GET /v1/legislators/{bioguide_id}`, the system shall return the legislator record (terms, leadership roles, social media, district offices) plus citation.
- When a client requests `GET /v1/committees/{committee_id}`, the system shall return the committee record (including parent for subcommittees) plus citation.
- When a client requests `GET /v1/committees/{committee_id}/members?congress=<n>`, the system shall return the list of committee memberships joined with legislator records for that congress; if `congress` is omitted, default to the current congress.
- When a client requests `GET /v1/bills/{bill_id}/cosponsors`, the system shall return the list of legislators who cosponsored the bill (ordered by date_signed ascending).
- When a client requests `GET /v1/bills/{bill_id}/body`, the system shall return the bill's `body_text` (markdown-normalized) plus `body_size_bytes` and `body_sha256`.
- When a client requests `GET /v1/roll-call-votes/{vote_id}`, the system shall return the vote record plus the full roster of per-member positions.
- When a client requests `GET /v1/legislators/{bioguide_id}/votes?congress=<n>&limit=<n>`, the system shall return the cursor-paginated list of votes that legislator cast in the given congress (newest first).
- When a client requests `GET /v1/crs-reports/{id}`, `GET /v1/federal-register/{id}`, `GET /v1/public-laws/{id}`, the system shall return the full record plus citation.
- When a client requests `GET /v1/bills/{bill_id}` with a `bill_id` that does not match the documented grammar (`<type>:<congress>:<number>` per `rest-api-conventions` §5), the response shall be HTTP 400 with `error.code='invalid_id_format'` and a `hint.example`.
- Where two different resource types could share an ID shape, the resource-type segment in the URL path shall disambiguate; the substrate shall never issue overlapping ID prefixes across types (enforced by `rest-api-conventions` §5).
- When any record is returned by `GET /v1/search` (`rest-api-search`), its `id` field shall be accepted as input to the matching `GET /v1/<resource>/{id}` endpoint (round-trip contract — enforced by contract test in `test_search_endpoint.py` and re-asserted here).
- When a singleton endpoint returns a record with a body field, the response shall NOT inline the full body if `body_size_bytes > 50000`; instead it shall return `body: {size_bytes, url, sha256}` pointing at the dedicated `/v1/<resource>/{id}/body` endpoint (per `rest-api-conventions` §9).
- Where this spec defines an endpoint, the response shape, error envelope, datetime serialization, status codes, and rate-limit headers shall conform to `rest-api-conventions` without overrides.
Success determiner
Path
Runner
Contract test against the live FastAPI app, exercising every endpoint and ID-routing edge case: - Happy-path GET for each resource type returns the documented fieldset. - 404 with `record_not_found` for unknown IDs. - 400 with `invalid_id_format` + hint for malformed IDs. - `committees/{id}/members?congress=` returns a non-empty list for a known committee + congress; defaults to current congress when omitted. - `bills/{id}/cosponsors` returns ordered list. - Round-trip: take 10 random search hits across sources, assert each `id` is fetchable via the matching resource endpoint and returns `200 OK`. - Body separation: a bill with `body_size_bytes > 50000` returns `body: {size_bytes, url, sha256}` from the singleton, full markdown only from `/body`. Determiner currently fails because the resource handlers do not yet exist. Flips to passing as the handlers land alongside the search endpoint.
Clarifications needed
- Should sub-resources be flat singletons (`/v1/bills/hr:119:1/cosponsors` returning `[...]`) or wrapped in `{data, total}`? `rest-api-conventions` says list endpoints use the envelope, so this is settled — but worth confirming the per-bill cosponsor lists count as 'lists' for envelope purposes.
- Naming: `roll-call-votes` (hyphenated) vs `votes` in the URL — the substrate table is `roll_call_votes`. Lean on the resource name matching the substrate table name (consistent with citation IDs).
- Versions of bills (`bills/{id}/versions`, `bills/{id}/versions/{version_code}`) — include in this spec or defer? Body-norm and version metadata are already in the schema. Lean on including the index route here, deferring per-version singletons until a real consumer asks.
Out of scope
- Search semantics (BM25, vector, hybrid) — `rest-api-search` owns those.
- Fuzzy entity resolution (`?q=<noisy_name>` on registry sources) — `rest-api-entity-resolution` owns that. The `?q=` parameter is allow-listed only on registry list endpoints; body-bearing list endpoints reject it with a redirect to `/v1/search`.
- Analytical aggregates (counts, top-N, time-series) — `rest-api-aggregations` owns those (declarative aggregate params on list endpoints).
- Server-side cross-source fan-out — `rest-api-dossiers` owns those.
- MCP transport, tool naming, OAuth — `mcp-server` owns those. The MCP server's typed tools call the resource handlers defined here.
- Write endpoints (POST/PUT/DELETE) — substrate is read-only at v1.
- Cross-resource graph traversal (e.g., `/v1/legislators/{id}/sponsored-bills?cosponsored-by={other_id}`) — deferred; agents can issue two reads and join client-side until a real use case justifies a join endpoint.
- Per-version bill singletons — see clarifications.
Dependencies
Plan
## Endpoint inventory
Singletons (return one record, full fieldset, citation block):
````
GET /v1/bills/{bill_id}
GET /v1/bills/{bill_id}/body
GET /v1/legislators/{bioguide_id}
GET /v1/committees/{committee_id}
GET /v1/roll-call-votes/{vote_id}
GET /v1/crs-reports/{id}
GET /v1/federal-register/{id}
GET /v1/public-laws/{id}
GET /v1/hearings/{id}
GET /v1/hearing-transcripts/{id}
GET /v1/committee-reports/{id}
GET /v1/cbo-cost-estimates/{id}
GET /v1/gao-reports/{id}
GET /v1/statements-of-administration-policy/{id}
Sub-resource lists (return {data, next_cursor, has_more}, card fieldset):
````
GET /v1/bills/{bill_id}/cosponsors
GET /v1/bills/{bill_id}/actions
GET /v1/bills/{bill_id}/versions
GET /v1/committees/{committee_id}/members?congress=
GET /v1/committees/{committee_id}/subcommittees
GET /v1/legislators/{bioguide_id}/terms
GET /v1/legislators/{bioguide_id}/votes?congress=
GET /v1/legislators/{bioguide_id}/sponsored-bills?congress=
GET /v1/roll-call-votes/{vote_id}/members
## Routing key: type-prefixed IDs
Every ID issued by the substrate is type-prefixed (per rest-api-conventions
§5). That makes the MCP fetch(id) dispatcher a literal table lookup
— no need for a separate "what kind of ID is this?" call:
| ID example | Type prefix | Resource endpoint |
|---------------------|--------------------|-------------------|
| hr:119:103 | hr:|s:|hjres:|… | /v1/bills/{id} |
| S000033 | bioguide (regex) | /v1/legislators/{id} |
| HSWM | committee thomas | /v1/committees/{id} |
| house:119:2026:142| house:|senate: | /v1/roll-call-votes/{id} |
| R47892 | R\d{5} | /v1/crs-reports/{id} |
| 2026-08558 | FR doc number | /v1/federal-register/{id} |
| pl:119:1 | pl: | /v1/public-laws/{id} |
| crec:2026-05-10:S1234 | crec: | /v1/congressional-record/{id} |
## Body separation
Bills, CRS reports, hearings, and Federal Register documents can have
bodies in the hundreds of KB. Per rest-api-conventions §9:
- Singleton returns metadata + body: {size_bytes, url, sha256} when
body exceeds 50 KB.
- Dedicated /body endpoint streams the markdown-normalized text.
- Search snippets (200 char excerpts with offsets) come from search
results, not this endpoint family.
## Implementation surface
````
josh-core/josh_core/
routers/
bills.py
legislators.py
committees.py
roll_call_votes.py
crs_reports.py
federal_register.py
public_laws.py
hearings.py
committee_reports.py
cbo_cost_estimates.py
gao_reports.py
statements_of_administration_policy.py
congressional_record.py
services/
resource_fetch.py # shared fetch + 404 + citation construction
tests/
test_resource_endpoints.py # the success determiner
Each router is a thin SQL-to-Pydantic mapper. Citation block construction
is delegated to josh_substrate.citations.formatters.<source>.citation_for(record)
(shared with the ingester and with rest-api-search's result cards).
## Filter parameter sets (per-resource)
Each resource declares its filter allow-list in a FilterPlan
(analogous to AggregationPlan in rest-api-aggregations). The
router rejects unknown filters at the request boundary so agents
get loud errors, not silent empty results.
| Resource | Allowed filter params |
|-------------------|--------------------------------------------------------------------------------------|
| bills | congress, type, status, sponsor_bioguide, sponsor_party, sponsor_state, sponsor_chamber, referred_committee, since, until |
| legislators | party, chamber, state, district, congress, is_current |
| committees | chamber, congress |
| federal-register | agency, doc_type, since, until |
| hearings | committee, chamber, congress, since, until |
| roll-call-votes | chamber, congress, question_type, since, until |
| lda-filings | registrant_name, client_name, issue, year, quarter, min_spend, max_spend |
| crs-reports | topic, since, until |
| gao-reports | agency_addressed, recommendation_status, since, until |
| hearing-transcripts | hearing_id, witness_org, since, until |
| committee-reports | committee, congress, related_bill, since, until |
| cbo-cost-estimates | related_bill, bill_stage, since, until |
| public-laws | congress, since, until |
| statements-of-administration-policy | president_bioguide, stance, related_bill, since, until |
| regulations-dot-gov-dockets | agency, since, until |
| staff-directories | committee, legislator_bioguide, chamber |
| topic-taxonomy | broader_label |
Composability: filters AND across fields; multi-value within a
field's comma-separated list ORs. since / until are inclusive /
exclusive on the resource's primary time field (perrest-api-conventions §7).
## Adding a new resource
When a new ingester ships, the resource-endpoint spec gets a new router
file, a new FilterPlan entry, and a new entry in this spec's endpoint
inventory. The router is ~50 lines: fetch by primary key, 404 on miss,
citation block on hit, filter param parsing against the plan.mcp-server picks the new endpoint up automatically through the
prefix-dispatch table in its fetch handler — no MCP changes required
to add a new resource type, only a one-line addition to the dispatch table.
Tasks
0 of 15 done.
- t1 Shared resource_fetch service (404 + citation block construction)
- t2 Bills router: singleton, body, cosponsors, actions, versions
- t3 Legislators router: singleton, terms, votes, sponsored-bills
- t4 Committees router: singleton, members, subcommittees
- t5 Roll-call votes router: singleton + members
- t6 CRS reports, Federal Register, public laws routers (singletons)
- t7 Hearings, hearing-transcripts, committee-reports routers (singletons)
- t8 CBO cost estimates, GAO reports, SAPs routers (singletons)
- t9 Congressional record router (singleton by granule ID)
- t10 ID format validation: malformed IDs → 400 with `invalid_id_format` + hint
- t11 Body separation for records > 50 KB (link out to /body endpoint)
- t12 test_resource_endpoints.py covers every AC including search round-trip
- t13 OpenAPI schema emits one operation per endpoint
- t14 FilterPlan registry + per-resource filter param parsing (bills, legislators, committees, federal-register, hearings, roll-call-votes, lda-filings, etc.) — unknown filter → 400 `unknown_filter`; invalid value → 400 `invalid_filter_value` with hint.valid_values
- t15 Filter composition: AND across fields, OR within comma-separated multi-value, since/until on primary time field; tested for every resource's documented filter set
Changelog
-
2026-05-13T12:00:00Z
planned→plannedSpec filled in from stub. Pinned to be the canonical home of deterministic, identifier-bearing lookups — the half of the API that never enters FTS5 or vector retrieval. Documented the full endpoint inventory (singletons + sub-resource lists), the type-prefix → resource-router routing table (which `mcp-server`'s `fetch` tool reuses verbatim), and the body-separation rule from rest-api-conventions §9. Hardened the search round-trip contract: every ID from rest-api-search must be fetchable here. -
2026-05-13T15:00:00Z
planned→plannedPer-resource filter parameters added (FilterPlan registry) after the 64-query coverage analysis surfaced 10 questions (Category 4 and several in Category 7) that need richer structured filters than just `id`. Locked allow-list per resource (party, chamber, state, agency, congress, since, until, etc.); unknown filters return 400 with hint.valid_filters. Out-of-scope tightened: the `?q=` parameter on registry sources now routes through the new `rest-api-entity-resolution` spec, not this one (body sources reject `?q=` with a redirect to /v1/search). Three sibling specs added to handle adjacent shapes: aggregations, dossiers, entity-resolution.