Query coverage
64 realistic queries generated from the substrate schema alone, mapped to the API surface. The pressure test that surfaced three gaps and locked three new specs.
Method
A subagent was given the migration files and ingester specs — and explicitly forbidden from reading any of the API/MCP specs. Its instructions: "you are a senior policy analyst evaluating a new federal-policy database; generate 60+ realistic questions you might ask this data, across 8 personas (journalist, lobbyist, Hill staffer, federal contractor, academic, State AG, issue-advocacy nonprofit, constituent). Phrase the questions as users would."
The agent returned 64 queries across 10 intent categories. Each query was then mapped to the right query flow and API surface — the mapping below is the result of that mapping pass, which surfaced two real gaps (aggregation, dossiers) and validated one prior design recommendation (entity resolution as a separate tool).
When to re-run this
Any time a new data source is added; any time a search-shape spec changes meaningfully; before flipping any of the surface specs to verified. The exercise should take ~2 hours: spawn the subagent fresh (no API knowledge), let it generate, map by hand, diff against the prior version.
Headline
Per-query rows below carry a single status tag: ✓ supported by the existing surface, + new spec closing the gap, ◐ partial needing agent orchestration, or ⚠ deferred for a future spec. Tool annotations are the MCP class + tool name.
Coverage by category
1 · Specific record lookup
| # | Persona | Question | Sources | Tool | Status |
|---|---|---|---|---|---|
| 1 | Hill staffer | Status and latest action on HR 1 (119th) | bills | get_bill | ✓ |
| 2 | Constituent | Full text of Public Law 118-42 | public-laws | fetch / get_public_law | ✓ |
| 3 | Federal contractor | Open docket EPA-HQ-OAR-2024-0123 with comments metadata | regulations-dot-gov-dockets | fetch | ✓ |
| 4 | Journalist | Pull up CRS report R48481 | crs-reports | fetch / get_crs_report | ✓ |
| 5 | Lobbyist | Most recent LDA filing for Pfizer × Brownstein Hyatt | lda-filings | fetch + filter | ✓ |
| 6 | Academic | Roll-call vote record for House Vote 119-2025-87 | roll-call-votes | fetch / get_roll_call | ✓ |
| 7 | Hill staffer | SAP text for HR 4567 | statements-of-administration-policy | fetch | ✓ |
| 8 | State AG | GAO report GAO-25-106789 in full | gao-reports | fetch / get_gao_report | ✓ |
2 · Keyword scan within a fixed document
| # | Persona | Question | Sources | Tool | Status |
|---|---|---|---|---|---|
| 9 | Hill staffer | Find 'carried interest' inside HR 1 (119th) | bills | lexical_search + bill_id filter | ✓ |
| 10 | Federal contractor | Locate 'Buy American' in IRA public law text | public-laws | lexical_search + id filter | ✓ |
| 11 | Lobbyist | Search 2024-11-14 Senate Finance transcript for 'pharmacy benefit manager' | hearing-transcripts | lexical_search + hearing_id filter | ✓ |
| 12 | Academic | Every 'climate' in CREC 2025-09-12 | congressional-record | lexical_search + date filter | ✓ |
| 13 | Federal contractor | 'autonomous vehicle' in 49 CFR Part 571 | ecfr-and-cfr | lexical_search + title/part filter | ✓ |
| 14 | Journalist | 'whistleblower' inside CRS report R47999 | crs-reports | lexical_search + id filter | ✓ |
| 15 | Nonprofit | Every 'shall' clause in EPA PFAS final rule | federal-register | lexical_search + FR doc filter | ✓ |
| 16 | State AG | 'preemption' inside committee report for HR 2 (119th) | committee-reports | lexical_search + bill_id filter | ✓ |
3 · Conceptual / paraphrased search
| # | Persona | Question | Sources | Tool | Status |
|---|---|---|---|---|---|
| 17 | Lobbyist | Bills that would weaken patent protections for biologics | bills | semantic_search | ✓ |
| 18 | Academic | CREC speeches arguing the filibuster is undemocratic | congressional-record | semantic_search | ✓ |
| 19 | Federal contractor | Regs that effectively ban PFAS in firefighting foam | ecfr-and-cfr, federal-register | search (hybrid, multi-source) | ✓ |
| 20 | Journalist | GAO reports critical of DoD shipbuilding cost overruns | gao-reports | semantic_search | ✓ |
| 21 | Hill staffer | CRS reports on the legal theory behind nationwide injunctions | crs-reports | semantic_search | ✓ |
| 22 | State AG | Hearing testimony where Big Tech execs minimized child-safety concerns | hearing-transcripts | semantic_search | ✓ |
| 23 | Nonprofit | Bills proposing means-tested student debt cancellation | bills | semantic_search | ✓ |
| 24 | Constituent | Statutes allowing exec to freeze foreign assets w/o judicial review | us-code | semantic_search | ✓ |
4 · Filtered / scoped search
| # | Persona | Question | Sources | Tool | Status |
|---|---|---|---|---|---|
| 25 | Journalist | Every NPRM from EPA between 2025-01-20 and 2025-04-30 | federal-register | GET /federal-register?agency=EPA&doc_type=NPRM&since=&until= | + |
| 26 | Hill staffer | Bills introduced by Senate Republicans in 119th on 'border security' | bills, legislators | lexical_search + sponsor_party + sponsor_chamber + congress | + |
| 27 | Lobbyist | LDA filings on 'Section 230' in Q2 2025 above $50K | lda-filings | GET /lda-filings?issue=&year=2025&quarter=2&min_spend=50000 | + |
| 28 | Academic | Dems from swing-state House districts on Ukraine, 2024-01 to 2024-11 | congressional-record, legislators | lexical_search + party + chamber + state + date filter | + |
| 29 | Federal contractor | CBO cost estimates for energy bills in 2025 with 10-year cost > $5B | cbo-cost-estimates, bills | GET /cbo-cost-estimates?since=&until= + filter on cost — needs structured cost field | ◐ |
| 30 | Nonprofit | House hearings on 'voting rights' in committees chaired by Republicans in 118th | hearings, committees, legislators | orchestrate: list committees (chair filter) → hearings per committee → keyword filter | ◐ |
5 · Find-by-name / entity resolution
| # | Persona | Question | Sources | Tool | Status |
|---|---|---|---|---|---|
| 31 | Constituent | "Sen. Markey" from Massachusetts | legislators | resolve_entity(query='Markey', type='legislator', filters={state:'MA'}) | + |
| 32 | Journalist | Hill staffer "Katie O'Brian" / "Katherine O'Brien" on Energy & Commerce | staff-directories | resolve_entity(query='Katie O Brian', type='staff-directories', filters={committee:'HSEN'}) | + |
| 33 | Lobbyist | Lobbyist "Marc Lampkin" across LDA filings | lda-filings | resolve_entity(query='Marc Lampkin', type='lda-filings') | + |
| 34 | Hill staffer | "House Ag" — appropriations sub or full committee? | committees | resolve_entity(query='House Ag', type='committee') | + |
6 · Structured aggregation / counting
| # | Persona | Question | Sources | Tool | Status |
|---|---|---|---|---|---|
| 35 | Academic | Bill count per freshman House member in 119th, ranked | bills, legislators | count(bills, group_by=sponsor_bioguide_id, filters={congress:119, term_class:'freshman'}) | + |
| 36 | Journalist | % party-line named roll-call votes in 118th Senate | roll-call-votes | count + ratio calc — needs derived field or post-process | ◐ |
| 37 | Lobbyist | Top 20 LDA registrants by 2025 disclosed spending | lda-filings | sum(lda-filings, sum_field=spend_amount, group_by=registrant_name, top=20, filters={year:2025}) | + |
| 38 | Hill staffer | Bills in 119th with a committee markup, count | bills | count(bills, filters={congress:119, has_markup:true}) | + |
| 39 | Nonprofit | Open GAO recommendations from 2020-2024 | gao-reports | count(gao-reports, filters={recommendation_status:'open', since:2020, until:2025}) | + |
| 40 | Federal contractor | Top 10 federal agencies by FR rule count in 2025 | federal-register | count(federal-register, group_by=agency, top=10, filters={year:2025, doc_type:'rule'}) | + |
| 41 | Academic | Biden-admin veto-threat SAPs by congress | statements-of-administration-policy | count(saps, group_by=congress, filters={president_bioguide:'B...', stance:'veto_threat'}) | + |
7 · Cross-source joins
| # | Persona | Question | Sources | Tool | Status |
|---|---|---|---|---|---|
| 42 | Journalist | Bills with veto-threat SAPs in 118th × final roll-call vote | saps, bills, votes | list saps (filter stance) → get_bill_dossier per bill | + |
| 43 | Lobbyist | Bills with CRS reports within 30 days of intro × sponsor + committee | crs, bills, committees, legislators | orchestrate: list CRS with related_bill → get_bill_dossier | ◐ |
| 44 | State AG | GAO recommendations to HHS where a later FR rule cites the report | gao-reports, federal-register | Citation graph not extracted — deferred | ⚠ |
| 45 | Academic | Banking Committee senators in 119th × CRA votes × LDA filings naming them from finsvcs clients | committees, members, votes, bills, lda | get_committee_dossier → list_member_votes per senator → list_lda_filings filtered | ◐ |
| 46 | Hill staffer | HR 1 (119th): bill text + CBO + committee reports + hearings + SAP | bills + 4 more | get_bill_dossier('hr:119:1') | + |
| 47 | Journalist | Lobbyists on SAFE Banking Act who were former staff to a sponsoring member | lda, staff, bills, members | orchestrate: bill_dossier → cosponsors → former staff per member → lda filings cross-ref | ◐ |
| 48 | Nonprofit | For every PL in 118th, CFR sections amended + implementing FR rules | public-laws, us-code, cfr, fr | get_public_law_dossier per PL — CFR/FR cascade covered; precision depends on citation extraction | + |
| 49 | Federal contractor | DoD hypersonics hearings × bills by same committee within 60 days w/ overlap | hearings, transcripts, committees, bills | orchestrate: semantic_search transcripts → list bills by committee + date → similarity check | ◐ |
| 50 | Academic | 'climate change' topic: bills/CRS/GAO/FR counts per quarter since 2020 | topic-taxonomy + 4 sources | time_series per source, filter by topic — needs topic FKs on each source | ◐ |
| 51 | State AG | US Code sections cited in DOJ FR rules where the section was enacted by a PL in last 5 years | us-code, fr, public-laws | Citation graph not extracted — deferred | ⚠ |
| 52 | Hill staffer | Senators yes on NDAA whose CREC speech that week criticized provisions | votes, crec, bills, legislators | orchestrate: list votes (yes) → CREC for each member that week → semantic similarity | ◐ |
8 · Time-series / change-over-time
| # | Persona | Question | Sources | Tool | Status |
|---|---|---|---|---|---|
| 53 | Academic | Monthly bill volume mentioning 'AI' 2018–present | bills | time_series(bills, time_field=introduced_date, interval=month, filter=q:'AI') | + |
| 54 | Lobbyist | Disclosed crypto lobbying spend, QoQ since 2021 | lda-filings | time_series + sum(spend_amount, interval=quarter, filters={issue:crypto}) | + |
| 55 | Federal contractor | FR page count per administration since 2000 | federal-register | time_series + sum(page_count, interval=year) — needs page_count field | ◐ |
| 56 | Nonprofit | GAO cybersecurity reports as % of output since 2015 | gao-reports | two time_series calls + ratio — agent does the math | + |
9 · Discovery / "show me what's new"
| # | Persona | Question | Sources | Tool | Status |
|---|---|---|---|---|---|
| 57 | Hill staffer | New bills this week on rural broadband, opioids, VA care | bills | 3 × search calls with since= and topic — agent merges | ✓ |
| 58 | Federal contractor | New rules / proposed rules today from DOT, EPA, GSA | federal-register | list /federal-register?agency=DOT,EPA,GSA&since=today | ✓ |
| 59 | Lobbyist | New LDA filings this week from competing registrants | lda-filings | list /lda-filings?registrant_name=&since=... | ✓ |
| 60 | Journalist | Hearings scheduled next 14 days w/ witnesses if posted | hearings | list /hearings?since=&until=&sort=date | ✓ |
10 · Provenance / citation verification
| # | Persona | Question | Sources | Tool | Status |
|---|---|---|---|---|---|
| 61 | Journalist | Verify "HR 4321 eliminates Section 174 R&D amortization" claim | bills | get_bill_text + lexical_search('section 174') | ✓ |
| 62 | State AG | Verify advocacy report's claim about GAO-23-105432 recommendations | gao-reports | fetch + lexical_search('family detention') | ✓ |
| 63 | Hill staffer | Source rule + quoted $14B compliance cost from EPA rule | federal-register | search + fetch + scan | ✓ |
| 64 | Academic | Verify "5 USC §552a(b)(7)" Privacy Act exception text | us-code | fetch / get_us_code_section | ✓ |
Gap → spec map
The 22 queries that don't map cleanly to the existing surface cluster into three coherent gaps. Each gap maps to one new spec:
| Gap | Queries affected | New spec | Tool class added |
|---|---|---|---|
| Fuzzy entity resolution "Find X by noisy name, return canonical record" |
4 (Q31–34) · Category 5 | rest-api-entity-resolution | MCP Class D · resolve_entity |
| Aggregation / counting "Top-N, group-by, time-series, total" |
11 (Q35–41, Q53–56) · Categories 6 + 8 | rest-api-aggregations | MCP Class E · count, sum, time_series |
| Cross-source fan-out "One ID, everything related" |
4 closed (Q42, Q46, Q48, partial Q45) · the other 7 in Category 7 still need orchestration | rest-api-dossiers | MCP Class F · get_bill_dossier, get_legislator_dossier, get_committee_dossier, get_public_law_dossier |
| Richer per-resource filters "Party, state, agency, committee chair, etc." |
4 (Q25–28) · Category 4 | Update to rest-api-resource-endpoints (FilterPlan registry) | Class A typed tools accept the new filters |
What's deferred
Three queries (Q44, Q48, Q51) need citation-graph traversal — extracting and storing inter-document citations like "this FR rule cites GAO-23-105432" or "this CFR section was enacted by Public Law 118-42." Today the substrate carries citations as bibliographic metadata per record (source_url, citation_string), not as a graph between records.
The citation graph would be its own ingester problem: parse each document's body for citation patterns, write into a citations join table linking (citing_record, citing_section, cited_record, cited_section). Worth a dedicated spec at v1.x. Not blocking v1, but the three deferred queries are exactly the kind of investigative work this substrate exists to support, so it's first on the v1.x list.
Eight queries (in Category 7) are workable via agent orchestration but cross 4+ tools each — the cases where the LLM is most likely to drop a step mid-chain. Worth tracking against the dossier surface: if a recurring 5-tool pattern shows up in the routing eval, that's a candidate for a new dossier shape or a new join helper.
How to reproduce this exercise
Single subagent prompt with two rules: (1) read only the schema (migration files + ingester specs), (2) explicitly forbid reading any API/MCP spec. Ask for 60+ queries across 8 personas and 10 intent categories, each tagged with the sources it touches. Then map by hand — there's no automated mapper at v1, and the manual pass is what surfaces the gaps.
The exercise repeats well: re-run when adding a new source, when changing the search-shape spec, or before flipping any surface spec to verified. Diff against the prior 64 queries — new categories that emerge are signal, missing coverage is signal, and queries that move from "supported" to "partial" are red flags.
Related
- Query flows — how the four flows work, plus the source eligibility matrix.
- rest-api-conventions — pagination, envelopes, errors that all four surfaces share.
- rest-api-search · rest-api-resource-endpoints · rest-api-entity-resolution · rest-api-aggregations · rest-api-dossiers — the five REST surfaces.
- mcp-server — the MCP wrapper exposing all six tool classes (A typed lookups, B body search, C universal fetch, D entity resolution, E aggregations, F dossiers).