Roll Call Votes
Every recorded floor vote in the House and Senate — passage votes, motions to recommit, amendment votes, cloture, motion to suspend the rules, motion to proceed, election of Speaker, motions to table, and so on. Each roll call records every member's vote (not just the totals), which makes this the most direct expression of legislator behavior in the substrate.
This source has FK relationships in two directions: bills (a roll call is usually attached to a legis-num like H R 498 or an amendment to a bill) and legislators (every recorded vote names the member). Load bills and legislators first; this source plugs into both.
The good news: both chambers publish XML feeds on their official websites going back to the 101st Congress (1989), and as of 2026-05-29 both fetch over plain HTTP (the Senate's former Akamai bot wall is gone — vote_menu_119_1.xml and vote_119_1_00001.xml return HTTP/1.1 200 from Server: Apache to a bare curl). The bad news: each chamber's XML schema is completely different, and there is no unified ID space — the House uses bioguide IDs directly, the Senate uses lis_member_id which requires a crosswalk to bioguide.
Quick reference
Section titled “Quick reference”| Source name | Roll Call Votes |
| Publishers | House: Office of the Clerk (House Tally Clerks). Senate: Secretary of the Senate Bill Clerk via LIS. |
| License | Public domain |
| Auth | None. No formal rate limit observed; be polite (1–2 req/sec sustained). |
| Stable ID format | vote:{congress}-{chamber}-{session}-{number} e.g. vote:119-h-1-362, vote:119-s-1-77 |
| Status | exploring — schema drafted, ingestion not built |
Upstream feed homes: House Clerk Votes (clerk.house.gov/evs/{year}/) and Senate roll-call votes (senate.gov/legislative/LIS/roll_call_votes/). See Volume below for sizing.
Source priority decision
Section titled “Source priority decision”Primary: chamber-native XML. House Clerk's evs/{year}/roll{NNN}.xml and Senate's roll_call_votes/vote{c}{s}/vote_{c}_{s}_{NNNNN}.xml are the authoritative source for each chamber's votes. Same data the chambers themselves use to publish. No transformation layer.
Secondary (House only): Congress.gov v3 /house-vote endpoints. Beta as of 2025 — covers 118th Congress onward. Provides bioguide IDs already resolved (House XML uses name-id which is bioguide-format, but the API confirms; Congress.gov also de-dupes the per-party totals in a tidy structure). Useful as enrichment for House votes we already have via Clerk XML, especially for cross-checking. Costs API quota; not worth being primary. See the Congress.gov API docs for the /v3/house-vote endpoint surface.
No equivalent for Senate yet. Congress.gov API has no /senate-vote endpoint as of the v3 surface (2026-05). Senate XML is the only structured option.
Skip: scraping senate.gov/.../vote_C_S_NNNNN.htm (HTML). The XML adjacent at .xml has all the same data and is structured. Scraping HTML adds parsing fragility for nothing.
Skip: GovTrack vote dumps. GovTrack itself ingests these same Clerk + Senate XMLs. Reading their post-processed JSON adds nothing and creates a dependency on a downstream community service.
Access notes
Section titled “Access notes”House Clerk. Wide open. No CAPTCHA, no auth, no formal rate limit. Clean Last-Modified headers on individual XML files (text/xml, ~80–150 KB per roll call). Real-browser User-Agent harmless but not required. The yearly index is HTML (no JSON or XML index endpoint), keyed by year; older roll calls within a year are paged via ROLL_NNN.asp. Endpoint patterns are documented at House Clerk Votes.
Senate — plain HTTP, the former Akamai bot wall is gone (confirmed 2026-05-29). A live source-drift sweep on 2026-05-29 found that both vote_menu_119_1.xml and vote_119_1_00001.xml return HTTP/1.1 200 with Server: Apache to a bare curl (verified with both the default and curl/8 User-Agent). No 403 Forbidden, no AkamaiGHost edge. The Senate fetcher uses plain requests/httpx, same as the House — no headed browser, no real-browser fingerprint, no warm browser session. ~600 votes per session fetch sequentially over an hour at 1–2 req/sec. Endpoint patterns are documented at Senate roll-call votes.
Congress.gov API (House only, beta). Same key shape as bills — an api.data.gov key (register and set the env var; DEMO_KEY is rate-limited). The /house-vote endpoints are stable as of 2025. See the api.data.gov rate-limit docs.
Vocabulary / enums
Section titled “Vocabulary / enums”Each chamber's raw XML carries its own (unpublished) vocabularies. For Josh's purposes we normalize vote_cast to a single 5-value enum: yea, nay, present, not_voting, unknown. The raw value is preserved in vote_cast_raw.
Non-obvious raw values the parser must handle:
Aye/No— affirmative/negative on a Committee of the Whole vote (vsYea/Nayon a standard recorded vote). NormalizeAye→yea,No→nay.Present— refused to take a position; counts toward quorum.Not Voting(House) /Absent(Senate) — was absent. Note: the Senate XML<count>reports an<absent>total, but the per-member<vote_cast>value is unset (no member entry rendered) for absent senators on most votes — verify.- Speaker-name casts — on an Election-of-the-Speaker roll call, the raw cast is a candidate name (e.g.
JOHNSON,JEFFRIES,PRESENT), not yea/nay. Castvote_casttounknownwhen the raw value is anything other than the standard 5;vote_cast_rawpreserves the candidate name.
House vote-type values like YEA-AND-NAY, 2/3 YEA-AND-NAY (suspension of the rules, needs 2/3), RECORDED VOTE (Committee of the Whole), and QUORUM, plus the free-text vote-question / vote-result and the Senate question / vote_result / majority_requirement (1/2, 3/5 cloture, 2/3 treaty/override) vocabularies, are stored as free text and indexed for filter convenience. These chamber enums are not formally published upstream — sample values are documented inline in the chamber feeds linked above.
Stable ID format
Section titled “Stable ID format”Recommended: vote:{congress}-{chamber}-{session}-{number}, lowercased.
Examples:
vote:119-h-1-362(House, 119th Congress, 1st session, roll call 362)vote:119-s-1-77(Senate, 119th Congress, 1st session, roll call 77)vote:118-h-2-15(House, 118th Congress, 2nd session, roll call 15)
Rationale:
- All four dimensions (congress, chamber, session, number) are needed to disambiguate. House and Senate roll calls in the same congress have overlapping numbers; sessions reset numbering.
- Chamber as
h/smatches everywhere else in Josh. - This is round-trippable to both URL forms:
evs/{year}/roll{number}.xml(where year ↔ congress+session) androll_call_votes/vote{c}{s}/vote_{c}_{s}_{number}.xml. - Per-member-vote rows do not need a stable ID —
(vote_id, bioguide_id)is the natural composite key.
Year ↔ congress mapping: A congress's 1st session is the odd-numbered year; 2nd session the even-numbered year. House Clerk URLs use year; Senate URLs use (congress, session). Map at parse time.
Source-specific quirks
Section titled “Source-specific quirks”These parser-relevant quirks are observed in the chamber XML (the full annotated sample trees live in the House Clerk and Senate feeds):
- No unified ID space — the load-bearing integration fact. House XML's
name-idis a confirmed bioguide ID (A000370,B001281) — the samebioguideIdused in BILLSTATUS XML, a direct join. But the Senate's<lis_member_id>is not a bioguide ID: formatS###(S275,S428,S354). Mandatory crosswalk vialegislators.lis_idfrom theunitedstates/congress-legislatorsregistry — the legislator YAML'sid.lisfield carries this exact ID. See the legislators companion source. - Senate
<en_bloc>multi-matter votes. When multiple matters are voted in a single roll call, the<roll_call_vote>has an<en_bloc>element with multiple<matter>children, each with its own<issue>,<question>,<result>. Common for nominations grouped together (vote_119_1_00655voted on 4+ nominations en bloc). Stored in theen_bloc_mattersjsonb column. - VP
<tie_breaker>. When the VP breaks a tie,<by_whom>and<tie_breaker_vote>populate (Senate only). <amendment>/<document>. Populated when the vote is on an amendment (<amendment_number>,<amendment_purpose>,<amendment_to_document_number>) or attached to a bill/resolution (<document_type>,<document_number>,<document_short_title>).- Speaker-election casts.
vote-questionisElection of the Speaker; the cast is a candidate name, not yea/nay; counts may not balance (some members abstain by walking out). Plan storage for free-textvote_castvalues (Josh's schema usestextforvote_cast_raw). - Quorum calls. House
vote-type=QUORUMrecords every present member but the result is just attendance. - Free-text
vote_datewith a double space. Senate<vote_date>is a free-text timestamp likeFebruary 21, 2025, 12:54 AM(note the double space). Parse totimestampat load time; it's local to ET though that's not formally documented in the XML. The Senate menu (vote_menu_{C}_{S}.xml) carries a truncated<vote_date>(e.g.18-Dec, no year) — map the year via the menu's<congress_year>or fetch the per-vote XML for the canonical date. - House
time-etz/ DST. House XML's<action-time time-etz="14:59">carries an "ETZ" (Eastern Time Zone) attribute, but Daylight Saving handling is not formally documented — verifyvote_daterenders in UTC correctly across DST transitions. legis-numnormalizer.legis-numis'H R 498', not'hr498'. Needs a normalizer ('H R 498'→bill:{congress}-hr-498;'S Con Res 7'→bill:{congress}-sconres-7). Tabulate the variants and write the parser carefully.
Schema (Postgres DDL)
Section titled “Schema (Postgres DDL)”-- ============================================================-- Roll call votes (canonical, both chambers)-- ============================================================
CREATE TABLE roll_call_votes ( id text PRIMARY KEY, -- 'vote:119-h-1-362'
-- Composite natural key congress smallint NOT NULL, chamber text NOT NULL CHECK (chamber IN ('house', 'senate')), session smallint NOT NULL CHECK (session IN (1, 2)), rollcall_number int NOT NULL, UNIQUE (congress, chamber, session, rollcall_number),
-- When vote_date timestamptz NOT NULL, -- parsed from chamber-specific format modify_date timestamptz, -- Senate only; latest correction time
-- What vote_question text NOT NULL, -- 'On Passage', 'On the Amendment', etc. vote_question_text text, -- Senate-specific full description vote_type text, -- House: 'YEA-AND-NAY', '2/3 YEA-AND-NAY', etc. vote_title text, -- Senate: 'Paul Amdt No. 999' vote_description text, -- House: <vote-desc>; bill / amendment short majority_requirement text, -- Senate: '1/2', '3/5', '2/3'
-- Outcome vote_result text NOT NULL, -- 'Passed', 'Failed', 'Agreed to', 'Confirmed', etc. vote_result_text text, -- Senate: 'Amendment Rejected (24-76)' chamber_majority_party text, -- House <majority>: 'R' or 'D'
-- Cross-references — denormalized for filter speed -- The bill / amendment / nomination this vote is attached to legis_num text, -- House: 'H R 498', 'S 1234'; Senate: derived bill_id text, -- 'bill:119-hr-498' if resolvable; soft FK amendment_id text, -- 'amdt:119-samdt-999' if resolvable amendment_to_document text, -- 'S.Con.Res. 7' amendment_purpose text, -- Senate amendment purpose nomination_pn text, -- Senate nomination 'PN373', 'PN499-12'
-- Tally totals yea_total int, nay_total int, present_total int, not_voting_total int,
-- Tally by party (jsonb because schemas differ between chambers) party_totals jsonb, -- [{party, yea, nay, present, not_voting}, ...]
-- En bloc Senate votes that combine multiple matters en_bloc_matters jsonb, -- list of {issue, question, result}
-- Tie breaker (Senate only — VP) tie_breaker_by text, -- VP's name when used tie_breaker_vote text, -- 'Yea' or 'Nay'
-- Lifecycle / forensics raw_xml bytea NOT NULL, -- gzipped chamber XML for re-parse source_url text NOT NULL, source_last_modified timestamptz, fetched_at timestamptz NOT NULL, parsed_at timestamptz, inserted_at timestamptz NOT NULL DEFAULT now(), updated_at timestamptz NOT NULL DEFAULT now());
CREATE INDEX roll_call_votes_date ON roll_call_votes (vote_date DESC);CREATE INDEX roll_call_votes_congress_chamber ON roll_call_votes (congress, chamber, session, rollcall_number DESC);CREATE INDEX roll_call_votes_bill ON roll_call_votes (bill_id) WHERE bill_id IS NOT NULL;CREATE INDEX roll_call_votes_amendment ON roll_call_votes (amendment_id) WHERE amendment_id IS NOT NULL;CREATE INDEX roll_call_votes_question ON roll_call_votes (vote_question, vote_date DESC);
-- Search by description (denormalized fields)ALTER TABLE roll_call_votes ADD COLUMN search_tsv tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(vote_title, '') || ' ' || coalesce(vote_description, '')), 'A') || setweight(to_tsvector('english', coalesce(vote_question_text, '') || ' ' || coalesce(vote_question, '')), 'B') || setweight(to_tsvector('english', coalesce(amendment_purpose, '')), 'C') || setweight(to_tsvector('english', coalesce(vote_result_text, '')), 'D') ) STORED;CREATE INDEX roll_call_votes_search ON roll_call_votes USING gin (search_tsv);
-- ============================================================-- Per-member votes (the big table — ~9M rows all-time)-- ============================================================
CREATE TABLE roll_call_member_votes ( vote_id text NOT NULL REFERENCES roll_call_votes(id) ON DELETE CASCADE, bioguide_id text NOT NULL REFERENCES legislators(bioguide_id),
-- The actual vote cast — normalized enum vote_cast text NOT NULL CHECK (vote_cast IN ('yea', 'nay', 'present', 'not_voting', 'unknown')), vote_cast_raw text NOT NULL, -- 'Yea', 'Aye', 'No', 'Speaker name', ...
-- Snapshot of party/state at vote time (denormalized; party can change post-vote) party text, -- 'R', 'D', 'I' state text, -- 2-letter postal
PRIMARY KEY (vote_id, bioguide_id));
CREATE INDEX roll_call_member_votes_member ON roll_call_member_votes (bioguide_id, vote_id);CREATE INDEX roll_call_member_votes_cast ON roll_call_member_votes (vote_id, vote_cast);
-- ============================================================-- Senate LIS member ID crosswalk (set at parse time)-- ============================================================
-- This view materializes the lis_member_id -> bioguide mapping from the legislators table-- (id.lis is preserved per legislator). Used by the Senate vote loader.CREATE MATERIALIZED VIEW lis_member_to_bioguide ASSELECT lis_id, bioguide_idFROM legislatorsWHERE lis_id IS NOT NULL;
CREATE UNIQUE INDEX lis_member_to_bioguide_lis ON lis_member_to_bioguide (lis_id);Schema decisions worth flagging:
- One unified
roll_call_votestable for both chambers, with chamber-specific columns nullable. The shapes are different enough that two tables would be tempting, but most queries are chamber-agnostic ("how did Schumer vote on X?"). Use a check constraint on chamber and let nulls represent "doesn't apply." raw_xmlalways preserved as gzipped bytea. Vote XMLs are 50–150 KB each — small enough to keep raw across all 30K+ roll calls (~2–5 GB compressed).vote_cast_rawandvote_castboth stored. The normalized enum lets us filter cleanly (WHERE vote_cast = 'yea'); the raw value preserves Election-of-Speaker votes (where the cast is a candidate name, not yea/nay), Aye/No vs Yea/Nay distinction, etc. Castvote_casttounknownwhen the raw value is anything other than the standard 5.bill_idandamendment_idas soft FKs, not enforced. A vote may be loaded before its target bill is loaded (or for a bill we never ingest). Resolve at parse time when possible; leave NULL otherwise.party_totalsanden_bloc_mattersas jsonb. They're small, list-shaped, and rarely queried for filtering — perfect jsonb candidates. The denormalizedyea_total/nay_total/etc. on the parent are for the common-case query.- No
chamberonroll_call_member_votes— it's denormalized via thevote_idjoin. Thebioguide_idalready implies chamber via legislators.terms. lis_member_to_bioguideas a materialized view rather than a table — derived fromlegislators.lis_id. Refreshed nightly with the legislators sync. Used at Senate vote parse time to translate<lis_member_id>→bioguide_id.
Volume
Section titled “Volume”- House: ~400–700 roll calls per session, ~110K per-member-vote rows per session; ~26,000 roll calls and ~7M per-member rows all-time (since 1989, 101st Congress). XML ~80–150 KB per roll call.
- Senate: ~300–700 roll calls per session, ~30K–60K per-member rows per session; ~20,000+ roll calls and ~2M per-member rows all-time. XML ~50–100 KB per roll call.
- Combined: ~5–7 GB raw XML before parse; Postgres footprint ~3–5 GB after parse + tsvector index.
Download / update strategy
Section titled “Download / update strategy”Backfill (one-time, 101st Congress → present). For the House, loop each year 1989..current: fetch the yearly HTML index (and paged ROLL_NNN.asp pages), parse the table for (rollnumber, date, issue, question, result, title), fetch each roll{NNN}.xml, save raw XML to /data/corpus/roll_call_votes/house/{year}/roll{NNN}.xml, parse and load. ~26,000 mostly-serial fetches: ~7h at 1 req/sec, ~2h with 4 workers. For the Senate, loop each (congress, session) from (101, 1): fetch vote_menu_{C}_{S}.xml over plain requests/httpx (Akamai bot wall gone as of 2026-05-29), parse the <vote_number> list, fetch each vote_{c}_{s}_{NNNNN}.xml over plain HTTP, save and load. ~20,000 plain-HTTP fetches: ~5–6h at 1 req/sec, ~1.5h with 4 workers. (Previously the Senate backfill was bottlenecked at ~24–48h by a headed browser; that constraint is gone now that plain fetch works.)
Discovery accelerator (bootstrap / fallback only): the unitedstates/congress scrapers have already done a 10-year backfill of both chambers' votes; their parsed JSON output could be used as a one-time bootstrap. We don't want to depend on it as runtime, but it remains a useful bootstrap (and a fallback source if the Senate bot wall ever returns).
Daily incremental. Two cron ticks: House and Senate each poll their discovery feed (House index.asp; Senate vote_menu_{C}_{S}.xml over plain requests/httpx, no headed browser as of 2026-05-29) every 30 minutes during session days, hourly otherwise — diff against last seen, fetch new roll XML. House votes are published within minutes of the gavel. A weekly drift check compares our row counts against menu/index totals and alerts on drift >0.
Vote corrections. Both chambers occasionally correct a roll call after publication. House: usually a Last-Modified bump on the roll XML. Senate: <modify_date> exceeds <vote_date>. Detect and re-process with vote_id unchanged; preserve the prior raw_xml in a side table for audit trail.
State tracking. Per architecture, source keys roll_call_votes_house and roll_call_votes_senate are tracked separately in ingestion_runs. State stores the last-seen rollnumber per (year, chamber, session).
Failure modes.
- Senate 403 reappearance. Plain HTTP works as of 2026-05-29, but the Akamai bot wall could return. If Senate fetches start returning
403, fall back to the headed-browser fetcher pattern (headed Chrome viabrowser-use); if a fingerprinted browser session is itself blocked, restart it, rotate the user-data directory, and fall back tounitedstates/congress-derived JSON during the outage. - Roll call number reused / renumbered after correction. Rare but possible. The chamber issues a "correction" with the same number, plus an erratum. Compare new XML's
Last-Modified(House) or<modify_date>(Senate) against ours; refetch and re-load withvote_idunchanged. - Member voting under a
lis_member_idwe don't have a bioguide crosswalk for. Happens briefly when a new senator is sworn in mid-session — the legislators YAML has them but theid.lisfield hasn't been added yet. Log toingestion_logs, queue retry. - Speaker-election votes.
vote-questionisElection of the Speaker;vote_castis a candidate name (JOHNSON,JEFFRIES,PRESENT). Thevote_castenum becomesunknown;vote_cast_rawpreserves the candidate name. Counts may not balance (some members may abstain by walking out). Document invote_descriptionif known. - House XML occasionally 404s briefly between vote close and Clerk publish. Retry every 60s for up to 5 minutes; otherwise log and skip until next pass.
Open questions
Section titled “Open questions”These don't block ingestion but should be resolved before this source is "shipped":
- Headed-browser fetcher (fallback only). As of 2026-05-29 the Senate fetches over plain HTTP, so v1 does not need a real Chrome instance. If the Akamai bot wall returns, the fallback choices are (a) Playwright/Puppeteer in a sidecar container, (b) browser-use-style CDP-driven Chromium inside the ingester container, (c) a separate puppet-host (could be a Mac mini or another Hetzner box) that the ingester polls. (a) is the most ops-friendly. Keep this path documented but unbuilt until 403s reappear.
- Pre-1989 votes.
unitedstates/congressscrapers reach back to the 95th Congress (1977) for some chambers via PDF + OCR; full XML coverage starts 1989. Decide if pre-1989 is in scope for v1 (probably not — Congress.gov API doesn't have them either). - Vote correction history. When a vote is re-issued, we currently overwrite the parsed row and append the new raw XML. Should we keep prior
raw_xmlversions in aroll_call_vote_revisionsside table for audit trail? Lean yes; defer until we measure how often it happens. time-etztime zone. House XML's<action-time time-etz="14:59">carries a "ETZ" (Eastern Time Zone) attribute, but Daylight Saving handling is not formally documented. Verify we rendervote_datein UTC correctly across DST transitions.- Bioguide for territorial delegates and resident commissioner. They vote in Committee of the Whole but not on final passage. House XML's
roleattribute islegislatoreven for delegates — verify our schema correctly captures their votes (they should appear as recorded-votes when they participate). - Quorum calls. House
vote-type=QUORUMproduces XML where every present member is recorded but the result is just attendance. Decide whether to ingest these (lean yes — no extra cost; useful for "where was member X on date Y" queries) or skip. - Senate cloture vote tally edge case. When cloture is invoked on a motion to proceed vs the underlying matter, the same rollnumber may have nested
<question>and<vote_document_text>distinctions that take careful reading to resolve. Build a small set of test fixtures from real Senate XML before claiming the parser is robust. - Bill / amendment ID resolution.
legis-numis'H R 498'not'hr498'. We need a normalizer ('H R 498'→bill:{congress}-hr-498). Same for'S Con Res 7'→bill:{congress}-sconres-7. Tabulate the variants and write the parser carefully. - House vs Congress.gov API double-coverage. Congress.gov's
/house-voteAPI (118+ only) is redundant with Clerk XML but cleaner JSON. Decide: do we ingest both for verification, or trust Clerk XML alone? Lean toward Clerk-only with periodic Congress.gov spot-checks.
docs/sources/roll-call-votes.html · Last updated 2026-05-29