Skip to content

Public Laws

When the President signs a bill (or Congress overrides a veto), it becomes a Public Law — an enacted statute. The Public Law text is the canonical "as-passed" version: the exact words enacted into law, before any subsequent amendment. The Office of the Federal Register assigns a sequential public-law number per congress (P.L. 119-21); GPO publishes the law as part of the Statutes at Large; OLRC integrates the law into the US Code.

Public Laws are the bridge between bills (legislative process) and the US Code (codified statute). For Josh:

  • A bill becomes a Public Law: bill:119-hr-1law:119-21.
  • A Public Law amends USC sections: law:119-21 → multiple usc:42-... edits.
  • A Public Law has a Statutes at Large citation: 139 Stat. 100.
  • The text is structured as USLM XML — same schema as USC, with <sourceCredit>-style provenance back to bill versions.
Source namePublic Laws
PublishersOffice of the Federal Register (assigns law numbers); Government Publishing Office (publishes); National Archives (preserves).
LicensePublic domain
Coverage1995 – present (online systematically). Earlier coverage spotty in PLAW collection; older laws via Statutes at Large bound volumes.
Volume~200-400 laws per congress. ~3,000-5,000 total online.
Storage estimate~2 GB raw USLM XML + PDF; ~500 MB-1 GB extracted text
AuthNone. Public domain, no key required; ~1-2 req/sec polite. See GovInfo for developers.
Incremental sync hintsYearly sitemap diff + per-file Last-Modified; new laws appear within days of presidential signature
Stable ID formatlaw:{congress}-{number} e.g. law:119-21 (matches bills.bill_laws.public_law_number)
Statusexploring — schema drafted, ingestion not built

Endpoint patterns, the per-year PLAW_{YYYY}_sitemap.xml layout, per-package URL templates (USLM XML / PDF / HTML / MODS), the wssearch getContentDetail package-detail JSON, and rate/auth boilerplate are upstream GPO reference — see GovInfo for developers, the GovInfo bulk-data PLAW collection, and the GovInfo API docs (the wssearch endpoint is undocumented/internal — treat that link as best-effort).

Primary: GovInfo PLAW collection. Same shape as CHRG, CRPT, CREC, USCODE. Per-year sitemap, per-package USLM XML (the enacted version of the bill in modern markup), HTML, PDF, MODS metadata.

Note: Public Laws use USLM 2.0 XML. Live bill text uses the older "billres" DTD-based XML; only when a bill becomes a Public Law does the text get re-rendered in USLM. This is why we strongly prefer PLAW for enacted-text retrieval over BILLS.

Secondary: Bills source's <laws> element + Public Law action codes. Already captured in bills as bill_laws. Cross-references work: when bills.bill_laws.public_law_number = '119-21', we link to law:119-21 in this source.

Skip: scraping congress.gov/public-laws listings. Redundant with GovInfo.

Skip: pre-1995 PLAW backfill for v1. Statutes at Large bound volumes have older laws but require different parsing. v2 if useful.

GovInfo: open. Same patterns. USLM XML at /content/pkg/PLAW-{c}publ{n}/uslm/PLAW-{c}publ{n}.xml is directly fetchable (unlike USCODE's USLM, which lives only inside ZIP).

PLAW-{congress}publ{number} for Public Laws (e.g. PLAW-119publ21).

PLAW-{congress}pvtl{number} for Private Laws (rare; ~5-10 per congress; usually individual immigration relief or specific land conveyance).

CodeType
publPublic Law (the dominant volume)
pvtlPrivate Law

Approval mechanism (extractable from MODS or text)

Section titled “Approval mechanism (extractable from MODS or text)”
MechanismNotes
presidential_signatureSigned by the President — the default.
pocket_signedHeld without signing past the 10-day window during a session — becomes law without signature.
veto_overrideTwo-thirds of both chambers overrode a veto.
passed_without_signatureHeld without signing past the 10-day window — becomes law without signature.

The MODS doesn't always carry this distinction; extract from action history on the bill (bill_actions.action_code) or Approval Date / Signed By cues in PLAW MODS extension.

law:{congress}-{number} — matches bills.public_law_number.

Examples:

  • law:119-21 (Public Law 119-21 — the OBBBA reconciliation)
  • law:119-7 (Continuing Resolution etc.)
  • law:117-103 (Bipartisan Safer Communities Act)
  • law:118-priv-3 for Private Law 118-3 (using priv- infix to disambiguate)

The numbering resets per congress, so the congress prefix is essential.

The MODS and USLM 2.0 element grammar is generic GovInfo reference — see the GPO MODS schema, the USLM schema, and the Statutes at Large citation help. The two Josh-load-bearing takeaways:

  • The MODS <statuteAtLargeAmended> list is the citation graph for free. Each PLAW MODS carries its own canonical Stat. citation (e.g. 139 Stat. 100) plus dozens of <statuteAtLargeAmended> references — every prior Public Law this law amends, identified by its Stat. citation. We unpack that list into public_law_stat_references (below).
  • The USLM body is the bill text as enacted — section-by-section, with a top-level <bill> element and a <signature> line ("Approved July 4, 2025."). The amendments to existing law are spelled out in <amendingClause> markup (e.g. "Section 1396 of the Social Security Act (42 U.S.C. 1396) is amended—"), which is what we parse into public_law_usc_amendments.
MetricValue
Per congress~200-400 public laws + ~5-10 private laws
Per-year sitemap~100-200 entries (public laws are signed across two years per congress)
All-time online (1995+)~3,000-5,000
Per-law USLM XML50 KB - 2 MB (most under 200 KB; major bills like reconciliations 1-5 MB)

Postgres footprint: ~500 MB - 1 GB.

  1. Daily 04:00 UTC: poll current-year + previous-year sitemaps. New <url> → fetch USLM, MODS, parse, insert.
  2. Per-package Last-Modified for conditional GET.
  3. Cross-trigger from bills: when a bill_laws row is added (a bill became law), queue a fetch for PLAW-{c}publ{n}. The PLAW package may not yet exist (GPO takes a few days to publish); retry every few hours.

Status: exploring. This DDL is indicative — drafted, not shipped — ingestion for this source is not yet built; see public-laws-ingester (status: planned) and cross-check data status. The real implementation is SQLite/FTS5/vec0; migrations under shared/josh_substrate/.../migrations/versions/ are the schema source of truth once this source is built, and migrations win over docs.

-- ============================================================
-- Public Laws (and Private Laws)
-- ============================================================
CREATE TABLE public_laws (
id text PRIMARY KEY, -- 'law:119-21'
package_id text NOT NULL UNIQUE, -- 'PLAW-119publ21'
congress smallint NOT NULL,
law_type text NOT NULL CHECK (law_type IN ('public', 'private')),
law_number int NOT NULL,
UNIQUE (congress, law_type, law_number),
-- Citation
citation_text text NOT NULL, -- 'P.L. 119-21'
statutes_at_large_citation text, -- '139 Stat. 100' — the canonical Stat. cite for THIS law
-- Title
title text NOT NULL, -- 'An act to ...'
short_title text, -- when the act has one
-- Enacted from
bill_id text REFERENCES bills(id), -- 'bill:119-hr-1' — soft FK
bill_type text,
bill_number int,
originating_chamber text CHECK (originating_chamber IN ('HOUSE', 'SENATE') OR originating_chamber IS NULL),
-- When + how
approval_date date NOT NULL, -- date signed / became law
approval_mechanism text CHECK (approval_mechanism IN (
'presidential_signature', 'pocket_signed',
'veto_override', 'passed_without_signature'
) OR approval_mechanism IS NULL),
-- Text
body_text text NOT NULL,
body_uslm_xml bytea, -- gzipped USLM XML
-- Source URLs
govinfo_uslm_url text,
govinfo_html_url text,
govinfo_pdf_url text,
mods_url text,
sudoc_class_number text, -- 'AE 2.110:119-21'
-- Lifecycle
raw_mods_xml bytea,
sitemap_lastmod timestamptz,
fetched_at timestamptz NOT NULL,
parsed_at timestamptz,
inserted_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX public_laws_approval_date ON public_laws (approval_date DESC);
CREATE INDEX public_laws_congress ON public_laws (congress, law_number);
CREATE INDEX public_laws_bill ON public_laws (bill_id) WHERE bill_id IS NOT NULL;
ALTER TABLE public_laws ADD COLUMN search_tsv tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(short_title, '') || ' ' || coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(citation_text, '') || ' ' || coalesce(statutes_at_large_citation, '')), 'C') ||
setweight(to_tsvector('english', coalesce(body_text, '')), 'D')
) STORED;
CREATE INDEX public_laws_search ON public_laws USING gin (search_tsv);
-- ============================================================
-- Statutes at Large citations referenced by a Public Law
-- (the law amends prior laws — these are the references)
-- ============================================================
CREATE TABLE public_law_stat_references (
id bigserial PRIMARY KEY,
law_id text NOT NULL REFERENCES public_laws(id) ON DELETE CASCADE,
stat_citation text NOT NULL, -- '50 Stat. 664'
stat_volume int NOT NULL, -- 50
stat_page int NOT NULL, -- 664
relationship text NOT NULL CHECK (relationship IN ('amends', 'supersedes', 'cited')),
-- Resolved law_id when the Stat citation maps to a Public Law we have
resolved_law_id text REFERENCES public_laws(id),
UNIQUE (law_id, stat_citation, relationship)
);
CREATE INDEX public_law_stat_refs_volume ON public_law_stat_references (stat_volume, stat_page);
-- ============================================================
-- USC sections amended by this Public Law
-- (extracted from <amendingClause> + body XML at parse time)
-- ============================================================
CREATE TABLE public_law_usc_amendments (
id bigserial PRIMARY KEY,
law_id text NOT NULL REFERENCES public_laws(id) ON DELETE CASCADE,
usc_section_id text REFERENCES usc_sections(id), -- soft FK
usc_title int NOT NULL,
usc_section text NOT NULL,
amendment_type text CHECK (amendment_type IN (
'add', 'strike', 'amend', 'redesignate', 'repeal'
)),
amendment_text text -- the amending instruction text
);
CREATE INDEX public_law_usc_amendments_usc ON public_law_usc_amendments (usc_section_id) WHERE usc_section_id IS NOT NULL;
CREATE INDEX public_law_usc_amendments_law ON public_law_usc_amendments (law_id);
-- ============================================================
-- Vector chunks
-- ============================================================
CREATE TABLE public_law_chunks (
id bigserial PRIMARY KEY,
law_id text NOT NULL REFERENCES public_laws(id) ON DELETE CASCADE,
chunk_index int NOT NULL,
chunk_text text NOT NULL,
embedding vector(1024),
UNIQUE (law_id, chunk_index)
);
CREATE INDEX public_law_chunks_embedding
ON public_law_chunks USING hnsw (embedding vector_cosine_ops);

Schema decisions worth flagging:

  • bill_id is a soft FK to bills — when we have the bill, we link; otherwise NULL. The MODS <bill> element gives congress + type + number.
  • statutes_at_large_citation denormalized to the parent table for the law's own Stat citation; public_law_stat_references carries the full list of Stat citations referenced (amended/superseded). Useful for citation graph traversal.
  • public_law_usc_amendments is the explicit amends-graph edge. When law:119-21 amends usc:42-1396, this is the row. Parse from <amendingClause> and structural USLM analysis. This is the highest-value edge in the citation graph.
  • Body USLM XML inlined as gzipped bytea — file sizes are manageable (median ~100 KB). Re-parsing supported.
  • public_law_stat_references.resolved_law_id — when a Stat citation can be resolved to one of our public_laws rows, populate. Lets the agent answer "what laws does P.L. 119-21 amend?" with names, not just Stat numbers.
  • amendment_type enum captures common amending verbs (add, strike, amend, etc.). Default to amend when ambiguous.

Chunker: public_law_chunks uses vector(1024) with an HNSW index. Per data status, the chunker family is usc_uslm_section_v1 (USLM, same family as U.S. Code) — Phase 1 only; Phase 2 reuses the USC result.

GovInfo bulk-data discovery (the per-year PLAW_{YYYY}_sitemap.xml listing structure) is upstream — see the GovInfo bulk-data PLAW collection.

  1. For each year 1995..current_year:
    • Fetch PLAW_{year}_sitemap.xml.
    • For each <loc>, derive packageId.
    • Fetch USLM XML, MODS, save.
    • Parse: title, short title, citation, congress + law_number, bill cross-reference, Statutes at Large citations, USC amendments.
  2. ~3-5K packages × 2 fetches each ≈ ~6-10K calls. With 4 workers @ 1 req/s ≈ 1-2 hours.
  3. Embed chunks.
  1. Daily 04:00 UTC: poll current + previous year sitemap. Diff <lastmod>. Fetch new packages.
  2. Cross-trigger from BILLSTATUS: when bills.public_law_number is populated, schedule PLAW fetch retry every few hours until success.

Source key: public_laws. State stores per-year sitemap last_modified.

  • PLAW lag. GPO takes 1-7 days post-signature to publish PLAW. The bill-side BILLSTATUS reflects "Became Public Law" before the PLAW package exists. Retry queue handles.
  • Stat citation parsing. OCR-style errors in the MODS list (e.g. "Stat 664" vs "664 Stat") — defensive regex.
  • USC amendment extraction. USLM <amendingClause> is structured but the actual amending instruction text varies ("amended by inserting...", "by striking...", "by redesignating..."). Extract type + target as best we can; fall back to text.
  • Re-issued PLAW (rare correction). Same packageId, new Last-Modified. Re-parse.
  • Private laws. Different packageId pattern (pvtl) but same shape. Schema handles via law_type.

These don't block ingestion but should be resolved before this source is "shipped":

  • USC amendment extraction quality. This is the single highest-value extraction in this source — it builds the law-amends-USC edge. USLM has structured <amendingClause> and amendments-by-section markup, but the "instructions" are still natural-language ("by striking '50' and inserting '60'"). Build a careful parser; track precision/recall on a fixture set.
  • Statutes at Large to law_id resolution. The Stat. volume + page citation can be mapped to a Public Law when we know which year corresponds to which volume (Stat. volumes are roughly one per congress). Build a small lookup table.
  • Approval mechanism extraction. MODS doesn't always carry approvalMechanism. The PLAW body's <signature> line says "Approved [date]" or "Vetoed by the President but became law on [date] (overriding veto)". Body-text parsing.
  • Pre-1995 historical PLAW backfill. Statutes at Large bound volumes have OCR + searchable text on Hathitrust. Not in PLAW collection. Out of scope for v1.
  • Conference-report relationship. When a bill is enacted via conference report, the PLAW text reflects the conference-resolved version. The bill ID points to the originating bill (e.g., HR 1 in 119th); the report ID points to the conference report. Both linked, which is correct.
  • Multi-bill enactment. Some Public Laws enact multiple bills (rare; usually omnibus packages). MODS <bill> elements may list multiple. Schema's bill_id is single-valued; consider a join table for multi-bill laws.
  • Reconciliation laws like P.L. 119-21 (OBBBA) reference dozens of Statutes at Large. The MODS statuteAtLarge* lists are extensive. Volume estimates assume median; budget for outliers.