Legislators and Committees
Canonical registries of every Member of Congress (current and historical), every committee and subcommittee (active and dissolved), and the membership rosters that link them. This is a companion source — its records don't change much in absolute volume, but almost every other Josh source has a foreign key into it: bills join sponsors and cosponsors, roll-call votes join voters, hearings join witnesses, lobbying disclosures resolve to lobbied members, citation graph rows reference statute authors. Load this first. Nothing downstream is fully ingestible without it.
The good news: there is a single, community-maintained, public-domain canonical registry — unitedstates/congress-legislators — keyed by the same bioguide IDs that every official US government data source already uses (Congress.gov, GovInfo, House Clerk, Senate.gov, OpenSecrets). One-to-one matching, no transformation, no fuzzy joins.
Quick reference
Section titled “Quick reference”| Source name | Legislators and Committees |
| Publishers | Community-maintained (@unitedstates/congress-legislators — Eric Mill, Sunlight Foundation alumni, GovTrack maintainers, etc.), drawing on official House/Senate/Bioguide/LoC data. |
| License | CC0-1.0 (public domain) |
| Coverage | Legislators: 1789–present (every Member ever — ~12,800 historical + ~540 current). Committees: current Congress only for committees-current + committee-membership-current. Historical committees exist upstream but are out of scope at v1. |
| Storage estimate | <50 MB raw across the 7 fetched files; <500 MB in Postgres including denormalization |
| Access path | GitHub Pages mirror https://unitedstates.github.io/congress-legislators/{file}.yaml — preferred (see Access notes) |
| Auth | None |
| Stable ID format | Member: member:{bioguide_id} e.g. member:C000127. Committee: committee:{thomas_id} e.g. committee:HSAG. Subcommittee: committee:{parent_thomas}-{sub_thomas_id} e.g. committee:HSAG-15. |
| Status | exploring — schema drafted, ingestion not built |
For the upstream file inventory (sizes, record counts), HTTP/CDN behavior, and rate limits, see the unitedstates/congress-legislators repo.
Source priority decision
Section titled “Source priority decision”Several sources overlap for the same data. Recommendation:
Primary: unitedstates/congress-legislators GitHub Pages YAML. Single repo, public-domain, hand-curated since 2013, updated continuously. Same bioguide ID space as every official US government source. No translation layer needed.
Secondary: Congress.gov v3 API /member and /committee. Useful as a verification / spot-check source but redundant with the YAML — the YAML maintainers themselves cross-reference Congress.gov when curating. Costs api.data.gov quota for nothing we can't already get for free.
Skip: scraping bioguide.congress.gov directly. The YAML registry already does this work upstream. Scraping ourselves is reinventing the registry.
Skip: GovTrack member/committee dumps. The congress-legislators registry includes id.govtrack crosswalks; we'd just be ingesting downstream copies of upstream truth.
The YAML files are also the only canonical source for some fields that no official feed exposes well — leadership roles, district office addresses, social media handles, family relationships (e.g. dynasty tracking), and historical thomas/lis/icpsr/govtrack ID crosswalks. Building these from primary sources would take months.
Access notes
Section titled “Access notes”The repo lives on GitHub but the GitHub Pages mirror is the right access path, not raw.githubusercontent.com. The Pages mirror returns valid Content-Type: text/yaml (raw GitHub returns text/plain), serves clean Last-Modified and ETag headers, is CDN-cached at Fastly, and has CORS open for browser use.
https://unitedstates.github.io/congress-legislators/{file}.yamlWhy this matters for Josh: raw.githubusercontent.com serves no Last-Modified and ignores If-Modified-Since, which would break the conditional-GET state watermark the ingester relies on (see Caching / incremental sync). A real-browser User-Agent isn't required (no anti-bot wall observed). For programmatic discovery of what's changed, hit the GitHub commits API instead of the Pages mirror.
Files inventoried
Section titled “Files inventoried”Eight files exist upstream at https://unitedstates.github.io/congress-legislators/{name}.yaml; the ingester fetches 7 of them:
legislators-current.yaml— sitting Members of Congress (House + Senate).legislators-historical.yaml— every former Member, 1789–present.committees-current.yaml— active standing, select, special, and joint committees with subcommittees inline.committee-membership-current.yaml— who serves on what committee/subcommittee right now, with rank and title.legislators-social-media.yaml— Twitter, Facebook, YouTube, Instagram per current member. Official accounts only (campaign accounts excluded).legislators-district-offices.yaml— district/state office addresses, lat/long, phone, fax.executive.yaml— Presidents and Vice Presidents (1789–present).
committees-historical.yaml exists upstream but is intentionally out of scope at v1 (dissolved-committee lineages aren't needed for the current-Congress FK graph) and is not fetched. There is no JSON or CSV mirror committed in the repo — conversion is on the consumer.
Stable ID formats
Section titled “Stable ID formats”Member: member:{bioguide_id}, lowercased prefix.
Examples:
member:C000127(Maria Cantwell)member:K000367(Amy Klobuchar)member:W000178(George Washington — yes, Washington has a bioguide ID)
Rationale: bioguide IDs are the de facto canonical IDs across every US government data source — Congress.gov API, GovInfo BILLSTATUS XML, House Clerk roll-call votes, Senate.gov XML, OpenSecrets, Bioguide proper. They never collide and never get reused, even across centuries. Why invent a new ID?
Committee: committee:{thomas_id}, uppercase Thomas ID preserved.
- Top-level:
committee:HSAG(House Agriculture),committee:SSAF(Senate Ag),committee:JSPR(Joint Printing). - Subcommittee:
committee:HSAG-15(House Ag Forestry/Horticulture subcommittee —15is the subcommittee'sthomas_id).
The Thomas ID is what unitedstates/congress-legislators uses, what BILLSTATUS XML's <systemCode> derives from (hsbu00 = lowercase(HSBU) + "00"), and what every committee membership file keys on. The system_code (hsbu00, hsag15) is a separate field — see the schema. The full system_code is derived at parse time: top-level committees use lowercase(thomas_id) + "00" (e.g. hsag00); a subcommittee uses lowercase(parent.thomas_id) + subcommittee.thomas_id (e.g. hsag15 for HSAG subcommittee 15).
Crosswalk: the id block on each legislator carries IDs to every other system (govtrack, opensecrets, fec, icpsr, etc.). Preserve them all so we can join to outside datasets later.
Upstream record shapes
Section titled “Upstream record shapes”The full field-by-field shape of each YAML file — the legislator id/name/bio/terms[]/leadership_roles[]/family[]/other_names[] blocks, the committee and subcommittee fields, the committee-membership map, the social-media and district-office records, and all enum/vocabulary values (terms[].type, end-type, how, Senate class, state_rank, committee type, membership party/title) — is documented upstream in the congress-legislators README. Josh lands these into the Postgres schema below; the Josh-specific data-modeling consequences are called out here:
family[]is a soft pointer. The relative name is free-text, not abioguide_id. Resolving it to another member is on us if we want a true graph (legislator_family.relative_bioguideis a soft FK, NULL when unresolved).- Subcommittee
thomas_idis unique only within its parent committee. Numeric strings ('15','22') collide across parents — Josh composesparent_thomas + sub_thomasforcommittees.idand thesystem_code. committee-membership-current.yamldual-keys at one flat level. It's a map keyed by Thomas ID whereSSAF(full committee) andSSAF13(a subcommittee) are separate top-level keys — there's no nesting at this layer (the nesting lives incommittees-current.yaml). Member entries carryparty: majority/minority(chamber-relative, not Democrat/Republican), a within-party seniorityrank(1 = most senior), an optionaltitle, andchamberon joint committees.- Social media is current members only, official accounts only. Per the file's editorial policy, only taxpayer-funded "official" legislative accounts are accepted (campaign/personal accounts excluded). Numeric IDs (
twitter_id,instagram_id) are kept because handles can be renamed but numeric IDs are stable. - District office
idis{bioguide}-{slug}(e.g.A000055-cullman) and is stable across pulls — Josh uses it as the primary key. executive.yamlfollows the same shape aslegislators-historical.yamlbut withterms[].typein{prez, viceprez}and a few fewer ID crosswalks (no thomas/govtrack/opensecrets for early presidents). 80 records, Washington (1789) through the current administration. Useful for resolving Federal Registerpresidentslugs (donald-trump,joe-biden) to a canonical record with bio.
Schema (Postgres DDL)
Section titled “Schema (Postgres DDL)”-- ============================================================-- Members of Congress (current + historical, unified table)-- ============================================================
CREATE TABLE legislators ( id text PRIMARY KEY, -- 'member:C000127' bioguide_id text NOT NULL UNIQUE, -- 'C000127'
-- All cross-system IDs for joining to other datasets thomas_id text, lis_id text, govtrack_id int, opensecrets_id text, votesmart_id int, fec_ids text[], -- list per career cspan_id int, icpsr_id int, house_history_id int, house_history_alternate_id int, bioguide_previous text[], -- when bioguide was renumbered wikidata_id text, wikipedia text, -- article title ballotpedia text, maplight_id int, google_entity_id text, pictorial_id int,
-- Names first_name text, last_name text, middle_name text, suffix text, nickname text, official_full_name text,
-- Bio birthday date, gender text CHECK (gender IN ('M', 'F') OR gender IS NULL),
-- Status flags is_current boolean NOT NULL DEFAULT false, -- in legislators-current.yaml right now is_executive boolean NOT NULL DEFAULT false, -- in executive.yaml (presidents/VPs)
-- Lifecycle / forensics raw_yaml jsonb NOT NULL, -- full record for re-parsing fetched_at timestamptz NOT NULL, inserted_at timestamptz NOT NULL DEFAULT now(), updated_at timestamptz NOT NULL DEFAULT now());
CREATE INDEX legislators_current ON legislators (last_name, first_name) WHERE is_current;CREATE INDEX legislators_govtrack ON legislators (govtrack_id) WHERE govtrack_id IS NOT NULL;CREATE INDEX legislators_icpsr ON legislators (icpsr_id) WHERE icpsr_id IS NOT NULL;CREATE INDEX legislators_opensecrets ON legislators (opensecrets_id) WHERE opensecrets_id IS NOT NULL;
-- Search by nameALTER TABLE legislators ADD COLUMN search_tsv tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(official_full_name, '')), 'A') || setweight(to_tsvector('english', coalesce(last_name, '') || ' ' || coalesce(first_name, '')), 'B') || setweight(to_tsvector('english', coalesce(nickname, '')), 'C') ) STORED;CREATE INDEX legislators_search ON legislators USING gin (search_tsv);
-- ============================================================-- Terms (one per office period; many per legislator)-- ============================================================
CREATE TABLE legislator_terms ( id bigserial PRIMARY KEY, bioguide_id text NOT NULL REFERENCES legislators(bioguide_id) ON DELETE CASCADE,
term_type text NOT NULL CHECK (term_type IN ('rep', 'sen', 'prez', 'viceprez')), term_start date NOT NULL, term_end date NOT NULL,
state text, -- 2-letter, NULL for prez/viceprez district int, -- rep only; 0 = at-large senate_class smallint CHECK (senate_class IN (1, 2, 3) OR senate_class IS NULL), state_rank text CHECK (state_rank IN ('junior', 'senior') OR state_rank IS NULL),
party text, -- display name; party_affiliations is canonical when set party_affiliations jsonb, -- list of {start, end, party} when party changed mid-term caucus text,
how text, -- 'special-election', 'appointment', or NULL end_type text, -- end-type renamed to be SQL-friendly
-- Contact info (for sen/rep — the DC office) url text, address text, office text, phone text, fax text, contact_form text, rss_url text,
UNIQUE (bioguide_id, term_start, term_type));
CREATE INDEX legislator_terms_bioguide ON legislator_terms (bioguide_id);CREATE INDEX legislator_terms_active ON legislator_terms (term_end DESC, term_start) WHERE term_end >= CURRENT_DATE;CREATE INDEX legislator_terms_state_district ON legislator_terms (state, district);
-- ============================================================-- Leadership roles (sparse — most legislators have none)-- ============================================================
CREATE TABLE legislator_leadership_roles ( id bigserial PRIMARY KEY, bioguide_id text NOT NULL REFERENCES legislators(bioguide_id) ON DELETE CASCADE, title text NOT NULL, chamber text CHECK (chamber IN ('house', 'senate') OR chamber IS NULL), role_start date NOT NULL, role_end date NOT NULL, UNIQUE (bioguide_id, title, role_start));
CREATE INDEX legislator_leadership_active ON legislator_leadership_roles (bioguide_id, role_end DESC) WHERE role_end >= CURRENT_DATE;
-- ============================================================-- Family relations (sparse, free-text — soft pointer to other members)-- ============================================================
CREATE TABLE legislator_family ( id bigserial PRIMARY KEY, bioguide_id text NOT NULL REFERENCES legislators(bioguide_id) ON DELETE CASCADE, relative_name text NOT NULL, -- free-text; resolution to bioguide is a separate task relative_bioguide text, -- populated where we can match — soft FK relation text NOT NULL, -- 'son', 'father', 'brother', 'husband', etc. UNIQUE (bioguide_id, relative_name, relation));
-- ============================================================-- Other names (married/maiden names, naturalization name changes — historical)-- ============================================================
CREATE TABLE legislator_other_names ( id bigserial PRIMARY KEY, bioguide_id text NOT NULL REFERENCES legislators(bioguide_id) ON DELETE CASCADE, last_name text, middle_name text, used_until date, -- 'end' in YAML UNIQUE (bioguide_id, last_name, middle_name));
-- ============================================================-- Social media (current members only)-- ============================================================
CREATE TABLE legislator_social_media ( bioguide_id text PRIMARY KEY REFERENCES legislators(bioguide_id) ON DELETE CASCADE, twitter text, twitter_id text, facebook text, youtube text, youtube_id text, instagram text, instagram_id text, fetched_at timestamptz NOT NULL, updated_at timestamptz NOT NULL DEFAULT now());
-- ============================================================-- District offices (current members only)-- ============================================================
CREATE TABLE legislator_district_offices ( id text PRIMARY KEY, -- 'A000055-cullman' from YAML bioguide_id text NOT NULL REFERENCES legislators(bioguide_id) ON DELETE CASCADE, address text, suite text, building text, city text, state text, zip text, latitude numeric(9, 6), longitude numeric(9, 6), phone text, fax text, fetched_at timestamptz NOT NULL, updated_at timestamptz NOT NULL DEFAULT now());
CREATE INDEX legislator_district_offices_bioguide ON legislator_district_offices (bioguide_id);CREATE INDEX legislator_district_offices_state ON legislator_district_offices (state);
-- ============================================================-- Committees + subcommittees (current — historical in same table with is_current flag)-- ============================================================
CREATE TABLE committees ( id text PRIMARY KEY, -- 'committee:HSAG' or 'committee:HSAG-15' thomas_id text NOT NULL, -- 'HSAG' or '15' (subcommittee local id) parent_id text REFERENCES committees(id), -- subcommittees only
-- system_code is what BILLSTATUS XML uses — derived: hsag00 (full) or hsag15 (sub) system_code text UNIQUE,
chamber text NOT NULL CHECK (chamber IN ('house', 'senate', 'joint')), name text NOT NULL,
-- Only set on top-level committees house_committee_id text, senate_committee_id text,
url text, minority_url text, rss_url text, minority_rss_url text, address text, phone text, jurisdiction text, jurisdiction_source text, wikipedia text, -- article title youtube_id text,
is_current boolean NOT NULL DEFAULT true,
raw_yaml jsonb, fetched_at timestamptz NOT NULL, updated_at timestamptz NOT NULL DEFAULT now());
CREATE INDEX committees_parent ON committees (parent_id) WHERE parent_id IS NOT NULL;CREATE INDEX committees_chamber ON committees (chamber) WHERE is_current;CREATE INDEX committees_system_code ON committees (system_code) WHERE system_code IS NOT NULL;
-- ============================================================-- Committee membership (current — historical handled via congress-specific-- replays of committees-historical when we get to it)-- ============================================================
CREATE TABLE committee_memberships ( id bigserial PRIMARY KEY, committee_id text NOT NULL REFERENCES committees(id) ON DELETE CASCADE, bioguide_id text NOT NULL REFERENCES legislators(bioguide_id) ON DELETE CASCADE, party_alignment text NOT NULL CHECK (party_alignment IN ('majority', 'minority')), rank int NOT NULL, -- seniority rank within party title text, -- 'Chairman', 'Ranking Member', etc. chamber text CHECK (chamber IN ('house', 'senate') OR chamber IS NULL), -- joint committees only -- Period of membership (current snapshot, but congress-stamped for time-series) congress smallint NOT NULL, UNIQUE (committee_id, bioguide_id, congress));
CREATE INDEX committee_memberships_member ON committee_memberships (bioguide_id);CREATE INDEX committee_memberships_committee ON committee_memberships (committee_id);CREATE INDEX committee_memberships_chair ON committee_memberships (committee_id) WHERE title IN ('Chairman', 'Chair');Schema decisions worth flagging:
- Single
legislatorstable for current + historical + executive withis_currentandis_executiveflags. The shape is identical (modulo a few mostly-null fields for early presidents). Splitting into three tables would force every join site to UNION. raw_yamlpreserved per record — the YAML registry adds new ID crosswalks every few years (google_entity_id,pictorial,wikidataweren't all there a decade ago). Keep raw to re-parse.legislator_termsis the canonical source for state/district/party at any given date. The denormalized current-state onbills.sponsor_*(and similar) is a snapshot at time of sponsorship. To answer "who represents NY-12 today" join tolegislator_termswithterm_start <= now < term_end.- Committees + subcommittees in one table with
parent_idself-FK. Subcommittees inherit chamber from parent (denormalized to make the chamber filter cheap). Thesystem_codecolumn is the BILLSTATUS XML join key — derived at parse time (hsag00for the top-levelHSAG;hsag15for its subcommittee withthomas_id: '15'). committee_membershipsiscongress-stamped even though the source file iscommittee-membership-current— so when the next congress's data arrives, we don't lose history. Re-running ingestion in a new congress inserts new rows rather than overwriting.legislator_family.relative_bioguideis a soft FK (NULL when we couldn't resolve the free-text name). Resolution is a future task — possibly aided by thewikidata_idcrosswalk.party_affiliationsas ajsonbcolumn onlegislator_termsrather than a separate table — it's a sparse, list-of-3-fields-each structure. When set (rare), we'd rather query it as JSON than join.- No
executivetable separately.executive.yamlrecords load intolegislatorswithis_executive=trueand terms inlegislator_termswithterm_type IN ('prez', 'viceprez').
Volume
Section titled “Volume”Steady-state. Roughly 540 current legislators, 12,230 historical legislators (effectively static — grows by ~30-50 per congress as members retire), 49 committees + 181 subcommittees in current congress, 230 committee-roster entries. Storage in Postgres (after denormalizing terms, party affiliations, leadership, family, other names) is well under 500 MB.
Update cadence (from upstream git log): staffer-driven changes (committee assignments, leadership, contact info) every few days; real-time member entry/exit per resignation/swearing-in (committed within hours); daily-ish automated upstream runs; major churn per election cycle (terms file rewrites for all sitting members). A daily pull is sufficient for v1. Consider a faster (hourly) pull only during the first weeks of a new Congress when the committee membership churn is rapid.
Caching / incremental sync
Section titled “Caching / incremental sync”Each YAML file on the GitHub Pages mirror exposes ETag, Last-Modified, and Cache-Control: max-age=600 headers. The incremental strategy:
- Per-file
If-None-Match(primary) /If-Modified-Since(secondary) — a daily cron sends a conditional GET keyed on the storedETagfirst, falling back toLast-Modified. 304 responses are free; 200 responses mean the file changed since last pull. (Only the GitHub Pages host honors both forms —raw.githubusercontent.comserves noLast-Modifiedand ignoresIf-Modified-Since.) - GitHub commits API as a coarse change feed — returns recent commits with
commit.author.dateandcommit.message, useful for human-readable change reports ("Rep. Smith resigned on 2026-03-15") and for when aLast-Modifiedis bumped by an unrelated commit. See the GitHub list-commits API. - Granular diff at parse time — even when a file changed, most of its content didn't. Compute a row-level diff (by primary key) when reloading. Insert/update only changed rows.
Because the data is small, even a no-cache full reload daily is cheap (~10 MB total transfer). The conditional-GET dance is more about detecting a change so we know whether to re-parse and re-emit per-row diffs.
Download / update strategy
Section titled “Download / update strategy”Backfill (one-time, then daily)
Section titled “Backfill (one-time, then daily)”This source is small enough that "backfill" and "incremental" are the same operation. There is no historical-only fetch — we re-pull the same URLs daily.
- Fetch the 7 in-scope YAML files in parallel. 7 concurrent HTTPS requests (
committees-historical.yamlis skipped at v1). Total transfer ~11 MB. - Parse YAML → records. Use a streaming YAML parser (
PyYAMLis fine at this scale, ruamel.yaml for cycle safety if needed). - Upsert per-record. For each table, compute the row-level set difference between fresh and stored rows; insert new, update changed, soft-delete missing-from-fresh rows that should still exist in
legislators(because a member becoming ex-current should keep a row withis_current=false).
Specifically:
legislators-current.yaml→ setis_current=true, upsert.legislators-historical.yaml→ setis_current=false, upsert.- A member who moves from current to historical (e.g., resigned mid-session): the historical YAML adds them, the current YAML removes them. The merge must catch that: any row in
legislatorswhose bioguide was in last run's current set but isn't in this run's current set, flipis_currentto false. executive.yaml→ setis_executive=true, upsert.committees-current.yaml→ flatten committees + subcommittees, setis_current=true, upsert. Computesystem_codeat parse time.committees-historical.yaml→ not fetched at v1 (out of scope — would setis_current=falseand upsert when added later).committee-membership-current.yaml→ reset all current-congress rows, re-insert.legislators-social-media.yaml,legislators-district-offices.yaml→ upsert.
Daily incremental
Section titled “Daily incremental”A 02:00 UTC daily cron (after the upstream nightly automation runs) is sufficient.
Conditional-GET pattern — for each file, hold the previous run's Last-Modified and ETag and send If-None-Match/If-Modified-Since. On 304, skip parse. On 200, full parse + diff-merge.
The ingester also pulls the GitHub commits API (unauthenticated) to surface a human-readable change log: each commit's commit.message (e.g. "Rep. Cherfilus-McCormick resigned, Rep. David Scott died") is recorded in ingestion_logs for the run, so the admin UI can show why the file changed without reading diffs.
State tracking
Section titled “State tracking”Per ingestion architecture: ingestion state in ingestion_runs, ingestion_logs, ingestion_source_state. Source key: legislators_committees. Per-file state stores last_etag, last_modified, last_parsed_at.
Failure modes
Section titled “Failure modes”- HTTP 503 from GitHub Pages CDN — exponential backoff and retry against the same Pages host.
raw.githubusercontent.comis not used as a fallback: it serves noLast-Modifiedand ignoresIf-Modified-Since, so switching hosts would break the conditional-GET state watermark. - YAML parse error — log full file path, halt for this file, alert. Don't update partial state. The maintainers occasionally introduce a YAML error during a manual edit — usually fixed within hours.
- New top-level field on a record —
raw_yamlpreserves it; alert viaingestion_logsso we know to extend the schema. Don't fail. - Bioguide collision with existing record but different name — extremely rare (bioguide IDs are never reused), but possible if a member's bioguide is renumbered. Update via
bioguide_previousarray; alert. - Subcommittee
thomas_idcollision across parents — by design, subcommittee thomas_ids are only unique within parent. Composeparent_thomas + sub_thomasforcommittees.idand thesystem_code.
Open questions
Section titled “Open questions”These don't block ingestion but should be resolved before this source is "shipped":
- Historical committee membership.
committee-membership-current.yamlcovers only the current congress. Reconstructing committee rosters for past congresses requires either (a) git history of the file across previous-congress sessions, (b) thecommittees-historical.yamlmembership-by-congress section (if it exists — needs check), or (c) the Congress.gov API/committee/{c}/{chamber}/{committeeCode}/members. Decide before promising "show me who chaired Senate Finance in the 115th." Likely v1.x. - Caucus memberships beyond chamber alignment. The
terms[].caucusfield captures formal chamber alignment for independents (e.g., Sanders caucuses with Democrats). It does not capture issue-based caucuses (Freedom Caucus, Problem Solvers, Black Caucus, etc.). Those live in source #29 (caucus memberships, separate doc). family[]resolution to bioguide. The current free-text relative name doesn't link to another bioguide — could be valuable for political dynasty queries ("show me all Bushes in Congress"). One-time enrichment task, possibly via wikidatakinshipproperties.- Executive.yaml coverage cliff. Bioguide IDs exist for every president, but
thomasandlisare absent for early presidents (they predate those systems). Make sure code handles missing crosswalks gracefully. - Pre-1789 territorial delegates. The historical file appears to start at the first congress (1789). Earlier Continental Congress representatives — if they have bioguide IDs at all — may be absent. Out of scope for federal substrate, but check for completeness.
- District office geocoding accuracy. The
latitude/longitudevalues inlegislators-district-offices.yamlare committed manually and have observed ~50 m precision. Good enough for "is this office near the requester" use cases; not good enough for in-building queries. - House delegates and resident commissioners. DC, Puerto Rico, Guam, Virgin Islands, American Samoa, Northern Marianas — they have
terms[].type: repwith non-state values instate(e.g.DC,PR,GU). Verify ourstate CHAR(2)validation accepts these (currently broad — no enum constraint). - Update lag for ingestion. When a member dies or resigns mid-session, BILLSTATUS XML may continue listing them as a current sponsor for hours-to-days. The legislators YAML usually updates within hours of public reporting, but downstream sources may lag. We'll need to handle "sponsor bioguide doesn't yet exist as
is_current=true" gracefully in bills loader.