Public Laws
When the President signs a bill (or Congress overrides a veto), it becomes a Public Law — an enacted statute. The Public Law text is the canonical "as-passed" version: the exact words enacted into law, before any subsequent amendment. The Office of the Federal Register assigns a sequential public-law number per congress (P.L. 119-21); GPO publishes the law as part of the Statutes at Large; OLRC integrates the law into the US Code.
Public Laws are the bridge between bills (legislative process) and the US Code (codified statute). For Josh:
- A bill becomes a Public Law:
bill:119-hr-1→law:119-21. - A Public Law amends USC sections:
law:119-21→ multipleusc:42-...edits. - A Public Law has a Statutes at Large citation:
139 Stat. 100. - The text is structured as USLM XML — same schema as USC, with
<sourceCredit>-style provenance back to bill versions.
Quick reference
Section titled “Quick reference”| Source name | Public Laws |
| Publishers | Office of the Federal Register (assigns law numbers); Government Publishing Office (publishes); National Archives (preserves). |
| License | Public domain |
| Coverage | 1995 – present (online systematically). Earlier coverage spotty in PLAW collection; older laws via Statutes at Large bound volumes. |
| Volume | ~200-400 laws per congress. ~3,000-5,000 total online. |
| Storage estimate | ~2 GB raw USLM XML + PDF; ~500 MB-1 GB extracted text |
| Auth | None. Public domain, no key required; ~1-2 req/sec polite. See GovInfo for developers. |
| Incremental sync hints | Yearly sitemap diff + per-file Last-Modified; new laws appear within days of presidential signature |
| Stable ID format | law:{congress}-{number} e.g. law:119-21 (matches bills.bill_laws.public_law_number) |
| Status | exploring — schema drafted, ingestion not built |
Endpoint patterns, the per-year PLAW_{YYYY}_sitemap.xml layout, per-package URL templates (USLM XML / PDF / HTML / MODS), the wssearch getContentDetail package-detail JSON, and rate/auth boilerplate are upstream GPO reference — see GovInfo for developers, the GovInfo bulk-data PLAW collection, and the GovInfo API docs (the wssearch endpoint is undocumented/internal — treat that link as best-effort).
Source priority decision
Section titled “Source priority decision”Primary: GovInfo PLAW collection. Same shape as CHRG, CRPT, CREC, USCODE. Per-year sitemap, per-package USLM XML (the enacted version of the bill in modern markup), HTML, PDF, MODS metadata.
Note: Public Laws use USLM 2.0 XML. Live bill text uses the older "billres" DTD-based XML; only when a bill becomes a Public Law does the text get re-rendered in USLM. This is why we strongly prefer PLAW for enacted-text retrieval over BILLS.
Secondary: Bills source's <laws> element + Public Law action codes. Already captured in bills as bill_laws. Cross-references work: when bills.bill_laws.public_law_number = '119-21', we link to law:119-21 in this source.
Skip: scraping congress.gov/public-laws listings. Redundant with GovInfo.
Skip: pre-1995 PLAW backfill for v1. Statutes at Large bound volumes have older laws but require different parsing. v2 if useful.
Access notes
Section titled “Access notes”GovInfo: open. Same patterns. USLM XML at /content/pkg/PLAW-{c}publ{n}/uslm/PLAW-{c}publ{n}.xml is directly fetchable (unlike USCODE's USLM, which lives only inside ZIP).
packageId pattern
Section titled “packageId pattern”PLAW-{congress}publ{number} for Public Laws (e.g. PLAW-119publ21).
PLAW-{congress}pvtl{number} for Private Laws (rare; ~5-10 per congress; usually individual immigration relief or specific land conveyance).
Vocabulary / enums
Section titled “Vocabulary / enums”Law type
Section titled “Law type”| Code | Type |
|---|---|
publ | Public Law (the dominant volume) |
pvtl | Private Law |
Approval mechanism (extractable from MODS or text)
Section titled “Approval mechanism (extractable from MODS or text)”| Mechanism | Notes |
|---|---|
presidential_signature | Signed by the President — the default. |
pocket_signed | Held without signing past the 10-day window during a session — becomes law without signature. |
veto_override | Two-thirds of both chambers overrode a veto. |
passed_without_signature | Held without signing past the 10-day window — becomes law without signature. |
The MODS doesn't always carry this distinction; extract from action history on the bill (bill_actions.action_code) or Approval Date / Signed By cues in PLAW MODS extension.
Stable ID format
Section titled “Stable ID format”law:{congress}-{number} — matches bills.public_law_number.
Examples:
law:119-21(Public Law 119-21 — the OBBBA reconciliation)law:119-7(Continuing Resolution etc.)law:117-103(Bipartisan Safer Communities Act)law:118-priv-3for Private Law 118-3 (usingpriv-infix to disambiguate)
The numbering resets per congress, so the congress prefix is essential.
Response shapes
Section titled “Response shapes”The MODS and USLM 2.0 element grammar is generic GovInfo reference — see the GPO MODS schema, the USLM schema, and the Statutes at Large citation help. The two Josh-load-bearing takeaways:
- The MODS
<statuteAtLargeAmended>list is the citation graph for free. Each PLAW MODS carries its own canonical Stat. citation (e.g.139 Stat. 100) plus dozens of<statuteAtLargeAmended>references — every prior Public Law this law amends, identified by its Stat. citation. We unpack that list intopublic_law_stat_references(below). - The USLM body is the bill text as enacted — section-by-section, with a top-level
<bill>element and a<signature>line ("Approved July 4, 2025."). The amendments to existing law are spelled out in<amendingClause>markup (e.g. "Section 1396 of the Social Security Act (42 U.S.C. 1396) is amended—"), which is what we parse intopublic_law_usc_amendments.
Volume
Section titled “Volume”| Metric | Value |
|---|---|
| Per congress | ~200-400 public laws + ~5-10 private laws |
| Per-year sitemap | ~100-200 entries (public laws are signed across two years per congress) |
| All-time online (1995+) | ~3,000-5,000 |
| Per-law USLM XML | 50 KB - 2 MB (most under 200 KB; major bills like reconciliations 1-5 MB) |
Postgres footprint: ~500 MB - 1 GB.
Caching / incremental sync
Section titled “Caching / incremental sync”- Daily 04:00 UTC: poll current-year + previous-year sitemaps. New
<url>→ fetch USLM, MODS, parse, insert. - Per-package
Last-Modifiedfor conditional GET. - Cross-trigger from bills: when a
bill_lawsrow is added (a bill became law), queue a fetch forPLAW-{c}publ{n}. The PLAW package may not yet exist (GPO takes a few days to publish); retry every few hours.
Schema (Postgres DDL)
Section titled “Schema (Postgres DDL)”Status: exploring. This DDL is indicative — drafted, not shipped — ingestion for this source is not yet built; see public-laws-ingester (status: planned) and cross-check data status. The real implementation is SQLite/FTS5/vec0; migrations under
shared/josh_substrate/.../migrations/versions/are the schema source of truth once this source is built, and migrations win over docs.
-- ============================================================-- Public Laws (and Private Laws)-- ============================================================
CREATE TABLE public_laws ( id text PRIMARY KEY, -- 'law:119-21' package_id text NOT NULL UNIQUE, -- 'PLAW-119publ21'
congress smallint NOT NULL, law_type text NOT NULL CHECK (law_type IN ('public', 'private')), law_number int NOT NULL, UNIQUE (congress, law_type, law_number),
-- Citation citation_text text NOT NULL, -- 'P.L. 119-21' statutes_at_large_citation text, -- '139 Stat. 100' — the canonical Stat. cite for THIS law
-- Title title text NOT NULL, -- 'An act to ...' short_title text, -- when the act has one
-- Enacted from bill_id text REFERENCES bills(id), -- 'bill:119-hr-1' — soft FK bill_type text, bill_number int, originating_chamber text CHECK (originating_chamber IN ('HOUSE', 'SENATE') OR originating_chamber IS NULL),
-- When + how approval_date date NOT NULL, -- date signed / became law approval_mechanism text CHECK (approval_mechanism IN ( 'presidential_signature', 'pocket_signed', 'veto_override', 'passed_without_signature' ) OR approval_mechanism IS NULL),
-- Text body_text text NOT NULL, body_uslm_xml bytea, -- gzipped USLM XML
-- Source URLs govinfo_uslm_url text, govinfo_html_url text, govinfo_pdf_url text, mods_url text, sudoc_class_number text, -- 'AE 2.110:119-21'
-- Lifecycle raw_mods_xml bytea, sitemap_lastmod timestamptz, fetched_at timestamptz NOT NULL, parsed_at timestamptz, inserted_at timestamptz NOT NULL DEFAULT now(), updated_at timestamptz NOT NULL DEFAULT now());
CREATE INDEX public_laws_approval_date ON public_laws (approval_date DESC);CREATE INDEX public_laws_congress ON public_laws (congress, law_number);CREATE INDEX public_laws_bill ON public_laws (bill_id) WHERE bill_id IS NOT NULL;
ALTER TABLE public_laws ADD COLUMN search_tsv tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(short_title, '') || ' ' || coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(citation_text, '') || ' ' || coalesce(statutes_at_large_citation, '')), 'C') || setweight(to_tsvector('english', coalesce(body_text, '')), 'D') ) STORED;CREATE INDEX public_laws_search ON public_laws USING gin (search_tsv);
-- ============================================================-- Statutes at Large citations referenced by a Public Law-- (the law amends prior laws — these are the references)-- ============================================================
CREATE TABLE public_law_stat_references ( id bigserial PRIMARY KEY, law_id text NOT NULL REFERENCES public_laws(id) ON DELETE CASCADE, stat_citation text NOT NULL, -- '50 Stat. 664' stat_volume int NOT NULL, -- 50 stat_page int NOT NULL, -- 664 relationship text NOT NULL CHECK (relationship IN ('amends', 'supersedes', 'cited')), -- Resolved law_id when the Stat citation maps to a Public Law we have resolved_law_id text REFERENCES public_laws(id), UNIQUE (law_id, stat_citation, relationship));
CREATE INDEX public_law_stat_refs_volume ON public_law_stat_references (stat_volume, stat_page);
-- ============================================================-- USC sections amended by this Public Law-- (extracted from <amendingClause> + body XML at parse time)-- ============================================================
CREATE TABLE public_law_usc_amendments ( id bigserial PRIMARY KEY, law_id text NOT NULL REFERENCES public_laws(id) ON DELETE CASCADE, usc_section_id text REFERENCES usc_sections(id), -- soft FK usc_title int NOT NULL, usc_section text NOT NULL, amendment_type text CHECK (amendment_type IN ( 'add', 'strike', 'amend', 'redesignate', 'repeal' )), amendment_text text -- the amending instruction text);
CREATE INDEX public_law_usc_amendments_usc ON public_law_usc_amendments (usc_section_id) WHERE usc_section_id IS NOT NULL;CREATE INDEX public_law_usc_amendments_law ON public_law_usc_amendments (law_id);
-- ============================================================-- Vector chunks-- ============================================================
CREATE TABLE public_law_chunks ( id bigserial PRIMARY KEY, law_id text NOT NULL REFERENCES public_laws(id) ON DELETE CASCADE, chunk_index int NOT NULL, chunk_text text NOT NULL, embedding vector(1024), UNIQUE (law_id, chunk_index));CREATE INDEX public_law_chunks_embedding ON public_law_chunks USING hnsw (embedding vector_cosine_ops);Schema decisions worth flagging:
bill_idis a soft FK tobills— when we have the bill, we link; otherwise NULL. The MODS<bill>element gives congress + type + number.statutes_at_large_citationdenormalized to the parent table for the law's own Stat citation;public_law_stat_referencescarries the full list of Stat citations referenced (amended/superseded). Useful for citation graph traversal.public_law_usc_amendmentsis the explicit amends-graph edge. Whenlaw:119-21amendsusc:42-1396, this is the row. Parse from<amendingClause>and structural USLM analysis. This is the highest-value edge in the citation graph.- Body USLM XML inlined as gzipped bytea — file sizes are manageable (median ~100 KB). Re-parsing supported.
public_law_stat_references.resolved_law_id— when a Stat citation can be resolved to one of ourpublic_lawsrows, populate. Lets the agent answer "what laws does P.L. 119-21 amend?" with names, not just Stat numbers.amendment_typeenum captures common amending verbs (add,strike,amend, etc.). Default toamendwhen ambiguous.
Chunker: public_law_chunks uses vector(1024) with an HNSW index. Per data status, the chunker family is usc_uslm_section_v1 (USLM, same family as U.S. Code) — Phase 1 only; Phase 2 reuses the USC result.
Download / update strategy
Section titled “Download / update strategy”GovInfo bulk-data discovery (the per-year PLAW_{YYYY}_sitemap.xml listing structure) is upstream — see the GovInfo bulk-data PLAW collection.
Backfill (1995-present)
Section titled “Backfill (1995-present)”- For each year 1995..current_year:
- Fetch
PLAW_{year}_sitemap.xml. - For each
<loc>, derivepackageId. - Fetch USLM XML, MODS, save.
- Parse: title, short title, citation, congress + law_number, bill cross-reference, Statutes at Large citations, USC amendments.
- Fetch
- ~3-5K packages × 2 fetches each ≈ ~6-10K calls. With 4 workers @ 1 req/s ≈ 1-2 hours.
- Embed chunks.
Daily incremental
Section titled “Daily incremental”- Daily 04:00 UTC: poll current + previous year sitemap. Diff
<lastmod>. Fetch new packages. - Cross-trigger from BILLSTATUS: when
bills.public_law_numberis populated, schedule PLAW fetch retry every few hours until success.
State tracking
Section titled “State tracking”Source key: public_laws. State stores per-year sitemap last_modified.
Failure modes
Section titled “Failure modes”- PLAW lag. GPO takes 1-7 days post-signature to publish PLAW. The bill-side BILLSTATUS reflects "Became Public Law" before the PLAW package exists. Retry queue handles.
- Stat citation parsing. OCR-style errors in the MODS list (e.g. "Stat 664" vs "664 Stat") — defensive regex.
- USC amendment extraction. USLM
<amendingClause>is structured but the actual amending instruction text varies ("amended by inserting...", "by striking...", "by redesignating..."). Extract type + target as best we can; fall back to text. - Re-issued PLAW (rare correction). Same packageId, new
Last-Modified. Re-parse. - Private laws. Different packageId pattern (
pvtl) but same shape. Schema handles vialaw_type.
Open questions
Section titled “Open questions”These don't block ingestion but should be resolved before this source is "shipped":
- USC amendment extraction quality. This is the single highest-value extraction in this source — it builds the law-amends-USC edge. USLM has structured
<amendingClause>and amendments-by-section markup, but the "instructions" are still natural-language ("by striking '50' and inserting '60'"). Build a careful parser; track precision/recall on a fixture set. - Statutes at Large to law_id resolution. The Stat. volume + page citation can be mapped to a Public Law when we know which year corresponds to which volume (Stat. volumes are roughly one per congress). Build a small lookup table.
- Approval mechanism extraction. MODS doesn't always carry
approvalMechanism. The PLAW body's<signature>line says "Approved [date]" or "Vetoed by the President but became law on [date] (overriding veto)". Body-text parsing. - Pre-1995 historical PLAW backfill. Statutes at Large bound volumes have OCR + searchable text on Hathitrust. Not in PLAW collection. Out of scope for v1.
- Conference-report relationship. When a bill is enacted via conference report, the PLAW text reflects the conference-resolved version. The bill ID points to the originating bill (e.g., HR 1 in 119th); the report ID points to the conference report. Both linked, which is correct.
- Multi-bill enactment. Some Public Laws enact multiple bills (rare; usually omnibus packages). MODS
<bill>elements may list multiple. Schema'sbill_idis single-valued; consider a join table for multi-bill laws. - Reconciliation laws like P.L. 119-21 (OBBBA) reference dozens of Statutes at Large. The MODS
statuteAtLarge*lists are extensive. Volume estimates assume median; budget for outliers.