Skip to content

Roll Call Votes

Every recorded floor vote in the House and Senate — passage votes, motions to recommit, amendment votes, cloture, motion to suspend the rules, motion to proceed, election of Speaker, motions to table, and so on. Each roll call records every member's vote (not just the totals), which makes this the most direct expression of legislator behavior in the substrate.

This source has FK relationships in two directions: bills (a roll call is usually attached to a legis-num like H R 498 or an amendment to a bill) and legislators (every recorded vote names the member). Load bills and legislators first; this source plugs into both.

The good news: both chambers publish XML feeds on their official websites going back to the 101st Congress (1989), and as of 2026-05-29 both fetch over plain HTTP (the Senate's former Akamai bot wall is gone — vote_menu_119_1.xml and vote_119_1_00001.xml return HTTP/1.1 200 from Server: Apache to a bare curl). The bad news: each chamber's XML schema is completely different, and there is no unified ID space — the House uses bioguide IDs directly, the Senate uses lis_member_id which requires a crosswalk to bioguide.

Source nameRoll Call Votes
PublishersHouse: Office of the Clerk (House Tally Clerks). Senate: Secretary of the Senate Bill Clerk via LIS.
LicensePublic domain
AuthNone. No formal rate limit observed; be polite (1–2 req/sec sustained).
Stable ID formatvote:{congress}-{chamber}-{session}-{number} e.g. vote:119-h-1-362, vote:119-s-1-77
Statusexploring — schema drafted, ingestion not built

Upstream feed homes: House Clerk Votes (clerk.house.gov/evs/{year}/) and Senate roll-call votes (senate.gov/legislative/LIS/roll_call_votes/). See Volume below for sizing.

Primary: chamber-native XML. House Clerk's evs/{year}/roll{NNN}.xml and Senate's roll_call_votes/vote{c}{s}/vote_{c}_{s}_{NNNNN}.xml are the authoritative source for each chamber's votes. Same data the chambers themselves use to publish. No transformation layer.

Secondary (House only): Congress.gov v3 /house-vote endpoints. Beta as of 2025 — covers 118th Congress onward. Provides bioguide IDs already resolved (House XML uses name-id which is bioguide-format, but the API confirms; Congress.gov also de-dupes the per-party totals in a tidy structure). Useful as enrichment for House votes we already have via Clerk XML, especially for cross-checking. Costs API quota; not worth being primary. See the Congress.gov API docs for the /v3/house-vote endpoint surface.

No equivalent for Senate yet. Congress.gov API has no /senate-vote endpoint as of the v3 surface (2026-05). Senate XML is the only structured option.

Skip: scraping senate.gov/.../vote_C_S_NNNNN.htm (HTML). The XML adjacent at .xml has all the same data and is structured. Scraping HTML adds parsing fragility for nothing.

Skip: GovTrack vote dumps. GovTrack itself ingests these same Clerk + Senate XMLs. Reading their post-processed JSON adds nothing and creates a dependency on a downstream community service.

House Clerk. Wide open. No CAPTCHA, no auth, no formal rate limit. Clean Last-Modified headers on individual XML files (text/xml, ~80–150 KB per roll call). Real-browser User-Agent harmless but not required. The yearly index is HTML (no JSON or XML index endpoint), keyed by year; older roll calls within a year are paged via ROLL_NNN.asp. Endpoint patterns are documented at House Clerk Votes.

Senate — plain HTTP, the former Akamai bot wall is gone (confirmed 2026-05-29). A live source-drift sweep on 2026-05-29 found that both vote_menu_119_1.xml and vote_119_1_00001.xml return HTTP/1.1 200 with Server: Apache to a bare curl (verified with both the default and curl/8 User-Agent). No 403 Forbidden, no AkamaiGHost edge. The Senate fetcher uses plain requests/httpx, same as the House — no headed browser, no real-browser fingerprint, no warm browser session. ~600 votes per session fetch sequentially over an hour at 1–2 req/sec. Endpoint patterns are documented at Senate roll-call votes.

Congress.gov API (House only, beta). Same key shape as bills — an api.data.gov key (register and set the env var; DEMO_KEY is rate-limited). The /house-vote endpoints are stable as of 2025. See the api.data.gov rate-limit docs.

Each chamber's raw XML carries its own (unpublished) vocabularies. For Josh's purposes we normalize vote_cast to a single 5-value enum: yea, nay, present, not_voting, unknown. The raw value is preserved in vote_cast_raw.

Non-obvious raw values the parser must handle:

  • Aye / No — affirmative/negative on a Committee of the Whole vote (vs Yea / Nay on a standard recorded vote). Normalize Ayeyea, Nonay.
  • Present — refused to take a position; counts toward quorum.
  • Not Voting (House) / Absent (Senate) — was absent. Note: the Senate XML <count> reports an <absent> total, but the per-member <vote_cast> value is unset (no member entry rendered) for absent senators on most votes — verify.
  • Speaker-name casts — on an Election-of-the-Speaker roll call, the raw cast is a candidate name (e.g. JOHNSON, JEFFRIES, PRESENT), not yea/nay. Cast vote_cast to unknown when the raw value is anything other than the standard 5; vote_cast_raw preserves the candidate name.

House vote-type values like YEA-AND-NAY, 2/3 YEA-AND-NAY (suspension of the rules, needs 2/3), RECORDED VOTE (Committee of the Whole), and QUORUM, plus the free-text vote-question / vote-result and the Senate question / vote_result / majority_requirement (1/2, 3/5 cloture, 2/3 treaty/override) vocabularies, are stored as free text and indexed for filter convenience. These chamber enums are not formally published upstream — sample values are documented inline in the chamber feeds linked above.

Recommended: vote:{congress}-{chamber}-{session}-{number}, lowercased.

Examples:

  • vote:119-h-1-362 (House, 119th Congress, 1st session, roll call 362)
  • vote:119-s-1-77 (Senate, 119th Congress, 1st session, roll call 77)
  • vote:118-h-2-15 (House, 118th Congress, 2nd session, roll call 15)

Rationale:

  • All four dimensions (congress, chamber, session, number) are needed to disambiguate. House and Senate roll calls in the same congress have overlapping numbers; sessions reset numbering.
  • Chamber as h/s matches everywhere else in Josh.
  • This is round-trippable to both URL forms: evs/{year}/roll{number}.xml (where year ↔ congress+session) and roll_call_votes/vote{c}{s}/vote_{c}_{s}_{number}.xml.
  • Per-member-vote rows do not need a stable ID — (vote_id, bioguide_id) is the natural composite key.

Year ↔ congress mapping: A congress's 1st session is the odd-numbered year; 2nd session the even-numbered year. House Clerk URLs use year; Senate URLs use (congress, session). Map at parse time.

These parser-relevant quirks are observed in the chamber XML (the full annotated sample trees live in the House Clerk and Senate feeds):

  • No unified ID space — the load-bearing integration fact. House XML's name-id is a confirmed bioguide ID (A000370, B001281) — the same bioguideId used in BILLSTATUS XML, a direct join. But the Senate's <lis_member_id> is not a bioguide ID: format S### (S275, S428, S354). Mandatory crosswalk via legislators.lis_id from the unitedstates/congress-legislators registry — the legislator YAML's id.lis field carries this exact ID. See the legislators companion source.
  • Senate <en_bloc> multi-matter votes. When multiple matters are voted in a single roll call, the <roll_call_vote> has an <en_bloc> element with multiple <matter> children, each with its own <issue>, <question>, <result>. Common for nominations grouped together (vote_119_1_00655 voted on 4+ nominations en bloc). Stored in the en_bloc_matters jsonb column.
  • VP <tie_breaker>. When the VP breaks a tie, <by_whom> and <tie_breaker_vote> populate (Senate only).
  • <amendment> / <document>. Populated when the vote is on an amendment (<amendment_number>, <amendment_purpose>, <amendment_to_document_number>) or attached to a bill/resolution (<document_type>, <document_number>, <document_short_title>).
  • Speaker-election casts. vote-question is Election of the Speaker; the cast is a candidate name, not yea/nay; counts may not balance (some members abstain by walking out). Plan storage for free-text vote_cast values (Josh's schema uses text for vote_cast_raw).
  • Quorum calls. House vote-type=QUORUM records every present member but the result is just attendance.
  • Free-text vote_date with a double space. Senate <vote_date> is a free-text timestamp like February 21, 2025, 12:54 AM (note the double space). Parse to timestamp at load time; it's local to ET though that's not formally documented in the XML. The Senate menu (vote_menu_{C}_{S}.xml) carries a truncated <vote_date> (e.g. 18-Dec, no year) — map the year via the menu's <congress_year> or fetch the per-vote XML for the canonical date.
  • House time-etz / DST. House XML's <action-time time-etz="14:59"> carries an "ETZ" (Eastern Time Zone) attribute, but Daylight Saving handling is not formally documented — verify vote_date renders in UTC correctly across DST transitions.
  • legis-num normalizer. legis-num is 'H R 498', not 'hr498'. Needs a normalizer ('H R 498'bill:{congress}-hr-498; 'S Con Res 7'bill:{congress}-sconres-7). Tabulate the variants and write the parser carefully.
-- ============================================================
-- Roll call votes (canonical, both chambers)
-- ============================================================
CREATE TABLE roll_call_votes (
id text PRIMARY KEY, -- 'vote:119-h-1-362'
-- Composite natural key
congress smallint NOT NULL,
chamber text NOT NULL CHECK (chamber IN ('house', 'senate')),
session smallint NOT NULL CHECK (session IN (1, 2)),
rollcall_number int NOT NULL,
UNIQUE (congress, chamber, session, rollcall_number),
-- When
vote_date timestamptz NOT NULL, -- parsed from chamber-specific format
modify_date timestamptz, -- Senate only; latest correction time
-- What
vote_question text NOT NULL, -- 'On Passage', 'On the Amendment', etc.
vote_question_text text, -- Senate-specific full description
vote_type text, -- House: 'YEA-AND-NAY', '2/3 YEA-AND-NAY', etc.
vote_title text, -- Senate: 'Paul Amdt No. 999'
vote_description text, -- House: <vote-desc>; bill / amendment short
majority_requirement text, -- Senate: '1/2', '3/5', '2/3'
-- Outcome
vote_result text NOT NULL, -- 'Passed', 'Failed', 'Agreed to', 'Confirmed', etc.
vote_result_text text, -- Senate: 'Amendment Rejected (24-76)'
chamber_majority_party text, -- House <majority>: 'R' or 'D'
-- Cross-references — denormalized for filter speed
-- The bill / amendment / nomination this vote is attached to
legis_num text, -- House: 'H R 498', 'S 1234'; Senate: derived
bill_id text, -- 'bill:119-hr-498' if resolvable; soft FK
amendment_id text, -- 'amdt:119-samdt-999' if resolvable
amendment_to_document text, -- 'S.Con.Res. 7'
amendment_purpose text, -- Senate amendment purpose
nomination_pn text, -- Senate nomination 'PN373', 'PN499-12'
-- Tally totals
yea_total int,
nay_total int,
present_total int,
not_voting_total int,
-- Tally by party (jsonb because schemas differ between chambers)
party_totals jsonb, -- [{party, yea, nay, present, not_voting}, ...]
-- En bloc Senate votes that combine multiple matters
en_bloc_matters jsonb, -- list of {issue, question, result}
-- Tie breaker (Senate only — VP)
tie_breaker_by text, -- VP's name when used
tie_breaker_vote text, -- 'Yea' or 'Nay'
-- Lifecycle / forensics
raw_xml bytea NOT NULL, -- gzipped chamber XML for re-parse
source_url text NOT NULL,
source_last_modified timestamptz,
fetched_at timestamptz NOT NULL,
parsed_at timestamptz,
inserted_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX roll_call_votes_date ON roll_call_votes (vote_date DESC);
CREATE INDEX roll_call_votes_congress_chamber ON roll_call_votes (congress, chamber, session, rollcall_number DESC);
CREATE INDEX roll_call_votes_bill ON roll_call_votes (bill_id) WHERE bill_id IS NOT NULL;
CREATE INDEX roll_call_votes_amendment ON roll_call_votes (amendment_id) WHERE amendment_id IS NOT NULL;
CREATE INDEX roll_call_votes_question
ON roll_call_votes (vote_question, vote_date DESC);
-- Search by description (denormalized fields)
ALTER TABLE roll_call_votes ADD COLUMN search_tsv tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(vote_title, '') || ' ' || coalesce(vote_description, '')), 'A') ||
setweight(to_tsvector('english', coalesce(vote_question_text, '') || ' ' || coalesce(vote_question, '')), 'B') ||
setweight(to_tsvector('english', coalesce(amendment_purpose, '')), 'C') ||
setweight(to_tsvector('english', coalesce(vote_result_text, '')), 'D')
) STORED;
CREATE INDEX roll_call_votes_search ON roll_call_votes USING gin (search_tsv);
-- ============================================================
-- Per-member votes (the big table — ~9M rows all-time)
-- ============================================================
CREATE TABLE roll_call_member_votes (
vote_id text NOT NULL REFERENCES roll_call_votes(id) ON DELETE CASCADE,
bioguide_id text NOT NULL REFERENCES legislators(bioguide_id),
-- The actual vote cast — normalized enum
vote_cast text NOT NULL CHECK (vote_cast IN ('yea', 'nay', 'present', 'not_voting', 'unknown')),
vote_cast_raw text NOT NULL, -- 'Yea', 'Aye', 'No', 'Speaker name', ...
-- Snapshot of party/state at vote time (denormalized; party can change post-vote)
party text, -- 'R', 'D', 'I'
state text, -- 2-letter postal
PRIMARY KEY (vote_id, bioguide_id)
);
CREATE INDEX roll_call_member_votes_member ON roll_call_member_votes (bioguide_id, vote_id);
CREATE INDEX roll_call_member_votes_cast ON roll_call_member_votes (vote_id, vote_cast);
-- ============================================================
-- Senate LIS member ID crosswalk (set at parse time)
-- ============================================================
-- This view materializes the lis_member_id -> bioguide mapping from the legislators table
-- (id.lis is preserved per legislator). Used by the Senate vote loader.
CREATE MATERIALIZED VIEW lis_member_to_bioguide AS
SELECT lis_id, bioguide_id
FROM legislators
WHERE lis_id IS NOT NULL;
CREATE UNIQUE INDEX lis_member_to_bioguide_lis ON lis_member_to_bioguide (lis_id);

Schema decisions worth flagging:

  • One unified roll_call_votes table for both chambers, with chamber-specific columns nullable. The shapes are different enough that two tables would be tempting, but most queries are chamber-agnostic ("how did Schumer vote on X?"). Use a check constraint on chamber and let nulls represent "doesn't apply."
  • raw_xml always preserved as gzipped bytea. Vote XMLs are 50–150 KB each — small enough to keep raw across all 30K+ roll calls (~2–5 GB compressed).
  • vote_cast_raw and vote_cast both stored. The normalized enum lets us filter cleanly (WHERE vote_cast = 'yea'); the raw value preserves Election-of-Speaker votes (where the cast is a candidate name, not yea/nay), Aye/No vs Yea/Nay distinction, etc. Cast vote_cast to unknown when the raw value is anything other than the standard 5.
  • bill_id and amendment_id as soft FKs, not enforced. A vote may be loaded before its target bill is loaded (or for a bill we never ingest). Resolve at parse time when possible; leave NULL otherwise.
  • party_totals and en_bloc_matters as jsonb. They're small, list-shaped, and rarely queried for filtering — perfect jsonb candidates. The denormalized yea_total/nay_total/etc. on the parent are for the common-case query.
  • No chamber on roll_call_member_votes — it's denormalized via the vote_id join. The bioguide_id already implies chamber via legislators.terms.
  • lis_member_to_bioguide as a materialized view rather than a table — derived from legislators.lis_id. Refreshed nightly with the legislators sync. Used at Senate vote parse time to translate <lis_member_id>bioguide_id.
  • House: ~400–700 roll calls per session, ~110K per-member-vote rows per session; ~26,000 roll calls and ~7M per-member rows all-time (since 1989, 101st Congress). XML ~80–150 KB per roll call.
  • Senate: ~300–700 roll calls per session, ~30K–60K per-member rows per session; ~20,000+ roll calls and ~2M per-member rows all-time. XML ~50–100 KB per roll call.
  • Combined: ~5–7 GB raw XML before parse; Postgres footprint ~3–5 GB after parse + tsvector index.

Backfill (one-time, 101st Congress → present). For the House, loop each year 1989..current: fetch the yearly HTML index (and paged ROLL_NNN.asp pages), parse the table for (rollnumber, date, issue, question, result, title), fetch each roll{NNN}.xml, save raw XML to /data/corpus/roll_call_votes/house/{year}/roll{NNN}.xml, parse and load. ~26,000 mostly-serial fetches: ~7h at 1 req/sec, ~2h with 4 workers. For the Senate, loop each (congress, session) from (101, 1): fetch vote_menu_{C}_{S}.xml over plain requests/httpx (Akamai bot wall gone as of 2026-05-29), parse the <vote_number> list, fetch each vote_{c}_{s}_{NNNNN}.xml over plain HTTP, save and load. ~20,000 plain-HTTP fetches: ~5–6h at 1 req/sec, ~1.5h with 4 workers. (Previously the Senate backfill was bottlenecked at ~24–48h by a headed browser; that constraint is gone now that plain fetch works.)

Discovery accelerator (bootstrap / fallback only): the unitedstates/congress scrapers have already done a 10-year backfill of both chambers' votes; their parsed JSON output could be used as a one-time bootstrap. We don't want to depend on it as runtime, but it remains a useful bootstrap (and a fallback source if the Senate bot wall ever returns).

Daily incremental. Two cron ticks: House and Senate each poll their discovery feed (House index.asp; Senate vote_menu_{C}_{S}.xml over plain requests/httpx, no headed browser as of 2026-05-29) every 30 minutes during session days, hourly otherwise — diff against last seen, fetch new roll XML. House votes are published within minutes of the gavel. A weekly drift check compares our row counts against menu/index totals and alerts on drift >0.

Vote corrections. Both chambers occasionally correct a roll call after publication. House: usually a Last-Modified bump on the roll XML. Senate: <modify_date> exceeds <vote_date>. Detect and re-process with vote_id unchanged; preserve the prior raw_xml in a side table for audit trail.

State tracking. Per architecture, source keys roll_call_votes_house and roll_call_votes_senate are tracked separately in ingestion_runs. State stores the last-seen rollnumber per (year, chamber, session).

Failure modes.

  • Senate 403 reappearance. Plain HTTP works as of 2026-05-29, but the Akamai bot wall could return. If Senate fetches start returning 403, fall back to the headed-browser fetcher pattern (headed Chrome via browser-use); if a fingerprinted browser session is itself blocked, restart it, rotate the user-data directory, and fall back to unitedstates/congress-derived JSON during the outage.
  • Roll call number reused / renumbered after correction. Rare but possible. The chamber issues a "correction" with the same number, plus an erratum. Compare new XML's Last-Modified (House) or <modify_date> (Senate) against ours; refetch and re-load with vote_id unchanged.
  • Member voting under a lis_member_id we don't have a bioguide crosswalk for. Happens briefly when a new senator is sworn in mid-session — the legislators YAML has them but the id.lis field hasn't been added yet. Log to ingestion_logs, queue retry.
  • Speaker-election votes. vote-question is Election of the Speaker; vote_cast is a candidate name (JOHNSON, JEFFRIES, PRESENT). The vote_cast enum becomes unknown; vote_cast_raw preserves the candidate name. Counts may not balance (some members may abstain by walking out). Document in vote_description if known.
  • House XML occasionally 404s briefly between vote close and Clerk publish. Retry every 60s for up to 5 minutes; otherwise log and skip until next pass.

These don't block ingestion but should be resolved before this source is "shipped":

  • Headed-browser fetcher (fallback only). As of 2026-05-29 the Senate fetches over plain HTTP, so v1 does not need a real Chrome instance. If the Akamai bot wall returns, the fallback choices are (a) Playwright/Puppeteer in a sidecar container, (b) browser-use-style CDP-driven Chromium inside the ingester container, (c) a separate puppet-host (could be a Mac mini or another Hetzner box) that the ingester polls. (a) is the most ops-friendly. Keep this path documented but unbuilt until 403s reappear.
  • Pre-1989 votes. unitedstates/congress scrapers reach back to the 95th Congress (1977) for some chambers via PDF + OCR; full XML coverage starts 1989. Decide if pre-1989 is in scope for v1 (probably not — Congress.gov API doesn't have them either).
  • Vote correction history. When a vote is re-issued, we currently overwrite the parsed row and append the new raw XML. Should we keep prior raw_xml versions in a roll_call_vote_revisions side table for audit trail? Lean yes; defer until we measure how often it happens.
  • time-etz time zone. House XML's <action-time time-etz="14:59"> carries a "ETZ" (Eastern Time Zone) attribute, but Daylight Saving handling is not formally documented. Verify we render vote_date in UTC correctly across DST transitions.
  • Bioguide for territorial delegates and resident commissioner. They vote in Committee of the Whole but not on final passage. House XML's role attribute is legislator even for delegates — verify our schema correctly captures their votes (they should appear as recorded-votes when they participate).
  • Quorum calls. House vote-type=QUORUM produces XML where every present member is recorded but the result is just attendance. Decide whether to ingest these (lean yes — no extra cost; useful for "where was member X on date Y" queries) or skip.
  • Senate cloture vote tally edge case. When cloture is invoked on a motion to proceed vs the underlying matter, the same rollnumber may have nested <question> and <vote_document_text> distinctions that take careful reading to resolve. Build a small set of test fixtures from real Senate XML before claiming the parser is robust.
  • Bill / amendment ID resolution. legis-num is 'H R 498' not 'hr498'. We need a normalizer ('H R 498'bill:{congress}-hr-498). Same for 'S Con Res 7'bill:{congress}-sconres-7. Tabulate the variants and write the parser carefully.
  • House vs Congress.gov API double-coverage. Congress.gov's /house-vote API (118+ only) is redundant with Clerk XML but cleaner JSON. Decide: do we ingest both for verification, or trust Clerk XML alone? Lean toward Clerk-only with periodic Congress.gov spot-checks.

docs/sources/roll-call-votes.html · Last updated 2026-05-29