Regulations.gov dockets
When a federal agency proposes or finalizes a regulation, it opens a docket on regulations.gov for public comment. Each docket contains the regulatory document(s), supporting analyses, and (after the comment period) the public comments submitted. The docket is the lifecycle container for a single rulemaking.
For Josh, regulations.gov dockets give us:
- The comment count and deadline per active rulemaking — high-signal for "what's open for comment now."
- The mapping from a Federal Register document to its
docket_idand any related supporting documents. - The list of public comments (metadata only at v1; full comment text deferred to v2).
This source is adjacent to Federal Register — every FR rule cites a docket_id, and many dockets house multiple FR documents over their lifecycle. The relationship is M:N: a docket may contain multiple FR documents (proposed rule + final rule + correction); an FR document is in exactly one docket.
Quick reference
Section titled “Quick reference”| Source name | Regulations.gov dockets (metadata + document list) |
| Publisher | General Services Administration (regulations.gov is operated by GSA on behalf of agencies) |
| License | Public domain |
| Coverage | All open and historical federal rulemakings, ~2003 – present |
| Volume | ~1.5M+ dockets all-time; ~10K-30K active at any time. ~50K-100K new documents per year. |
| Storage estimate | ~1-2 GB metadata; full comment text would be 100s of GB to TB (deferred to v2). |
| Access | regulations.gov v4 REST API, JSON, base https://api.regulations.gov/v4/. Requires X-Api-Key. See Access notes. |
| Stable ID format | Docket: docket:{docketId} e.g. docket:EPA-HQ-OAR-2025-0001. Document: regdoc:{documentId}. |
| Status | exploring — schema drafted, ingestion not built |
Source priority decision
Section titled “Source priority decision”Primary: regulations.gov v4 REST API. The canonical source. Direct from GSA. JSON. Filterable by date and agency. See the GSA API reference for endpoint routes, query params, and ?include=attachments.
Secondary: Federal Register API embedded regulations_dot_gov_info field (already documented in Federal Register). Carries comments_count, comment_url, docket_id, document_id for FR docs that exist in regulations.gov. This is redundant with the regulations.gov API but lower-cost (no key needed). Use as enrichment for FR docs we already have.
Skip: scraping regulations.gov website. API is the right path; scraping is fragile and adds nothing.
Skip: comment text in v1. Per the data sources index v2 deferral list — full comment text is hundreds of GB to TB. We ingest comment counts and deadlines per docket; per-comment-text fetch is on-demand only.
Access notes
Section titled “Access notes”An X-Api-Key header is required for every request. Register for a production key at api.data.gov/signup and set it as the API key env var. DEMO_KEY works for testing but rate-limits aggressively — observed at ~30/hour in practice, which won't sustain ingestion.
The API is fronted by api.data.gov's api-umbrella gateway, not Cloudflare — live response headers show via: api-umbrella (ApacheTrafficServer), x-api-umbrella-request-id, x-vcap-request-id, and x-ratelimit-* (no Cloudflare markers). Throttling surfaces as HTTP 429 with x-ratelimit-* headers and Retry-After. Plan defensive retry: respect Retry-After, then exponential backoff with jitter keyed on 429. See the upstream api.data.gov rate limits for current default and per-endpoint caps.
The commenting subsystem (/comments/{commentId}) is rate-limited more strictly than dockets/documents. We don't ingest comment text in v1, so this matters less.
Stable ID formats
Section titled “Stable ID formats”Docket: docket:{docketId}, where docketId is the agency-assigned identifier verbatim.
Examples:
docket:EPA-HQ-OAR-2025-0001docket:FAA-2024-1234docket:NHTSA-2025-0050
The format is {AGENCY}-{ORG}-{TYPE}-{YEAR}-{SEQUENCE} but varies by agency. Treat as opaque string.
Document: regdoc:{documentId}, where documentId is regulations.gov's identifier (often {docketId}-NNNN form).
The Federal Register document number is separate from regulations.gov's documentId. fr_documents.document_number (e.g. 2026-08558) is FR-side; regdoc:{documentId} (e.g. EPA-HQ-OAR-2025-0001-0123) is regulations.gov-side. They're linked via cross-reference fields.
Volume
Section titled “Volume”| Metric | Value |
|---|---|
| Dockets total (all-time) | ~1.5M+ |
| Active dockets (open for comment) | ~10K-30K at any time |
| Documents per year | ~50K-100K |
| Comments per year | tens of millions (v2 deferral) |
| Per-docket metadata | ~5-10 KB JSON |
| Per-document metadata | ~5-10 KB JSON |
Postgres footprint: ~1-2 GB metadata only.
Caching / incremental sync
Section titled “Caching / incremental sync”- Hourly:
/dockets?filter[lastModifiedDate][ge]={last_run}&page[size]=250&sort=-lastModifiedDate— paginate, fetch new dockets. - Hourly:
/documents?filter[lastModifiedDate][ge]={last_run}&page[size]=250&sort=-lastModifiedDate— paginate, fetch new documents. - Per docket with
openForComment=true: checkcommentEndDateagainst today; markclosing_soonfor ones within 7 days. - Cross-trigger from Federal Register loader: when an FR doc's
regulations_dot_gov_info.docket_idis populated, schedule a docket fetch if we don't already have it.
The API rate limit is the binding constraint. With 1,000/hour and ~50K-100K new docs/year, the daily volume is manageable but burst-sensitive.
Schema (Postgres DDL)
Section titled “Schema (Postgres DDL)”-- ============================================================-- Dockets-- ============================================================
CREATE TABLE reg_dockets ( id text PRIMARY KEY, -- 'docket:EPA-HQ-OAR-2025-0001' docket_id text NOT NULL UNIQUE, -- 'EPA-HQ-OAR-2025-0001'
agency_id text, -- 'EPA' (regs.gov code; differs from FR slug) docket_type text, -- 'Rulemaking' | 'Nonrulemaking' | 'Other' sub_type text, sub_type_2 text, category text, rin text, -- '2060-AV12'
title text, short_title text, docket_abstract text,
last_modified_date timestamptz,
-- Lifecycle raw_json jsonb NOT NULL, fetched_at timestamptz NOT NULL, parsed_at timestamptz, inserted_at timestamptz NOT NULL DEFAULT now(), updated_at timestamptz NOT NULL DEFAULT now());
CREATE INDEX reg_dockets_last_modified ON reg_dockets (last_modified_date DESC);CREATE INDEX reg_dockets_agency ON reg_dockets (agency_id, last_modified_date DESC);CREATE INDEX reg_dockets_rin ON reg_dockets (rin) WHERE rin IS NOT NULL;
ALTER TABLE reg_dockets ADD COLUMN search_tsv tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '') || ' ' || coalesce(short_title, '')), 'A') || setweight(to_tsvector('english', coalesce(docket_abstract, '')), 'B') || setweight(to_tsvector('english', coalesce(category, '')), 'C') ) STORED;CREATE INDEX reg_dockets_search ON reg_dockets USING gin (search_tsv);
-- ============================================================-- Documents (within dockets)-- ============================================================
CREATE TABLE reg_documents ( id text PRIMARY KEY, -- 'regdoc:EPA-HQ-OAR-2025-0001-0001' document_id text NOT NULL UNIQUE, -- 'EPA-HQ-OAR-2025-0001-0001' docket_id text NOT NULL, -- 'EPA-HQ-OAR-2025-0001' docket_row_id text REFERENCES reg_dockets(id) ON DELETE CASCADE,
agency_id text, document_type text NOT NULL, -- 'Rule' | 'Proposed Rule' | 'Supporting & Related Material' | 'Other' subtype text, title text, doc_abstract text,
-- Federal Register cross-reference fr_doc_num text, -- '2025-25434' — joins to fr_documents.document_number fr_document_id text REFERENCES fr_documents(id), -- soft FK ('fr:2025-25434')
-- Comment window posted_date timestamptz, comment_start_date timestamptz, comment_end_date timestamptz, open_for_comment boolean NOT NULL DEFAULT false,
-- Status withdrawn boolean NOT NULL DEFAULT false, last_modified_date timestamptz,
-- Comment counts (denormalized; refresh periodically) comment_count int, comment_count_fetched_at timestamptz,
-- Lifecycle raw_json jsonb NOT NULL, fetched_at timestamptz NOT NULL, parsed_at timestamptz, inserted_at timestamptz NOT NULL DEFAULT now(), updated_at timestamptz NOT NULL DEFAULT now());
CREATE INDEX reg_documents_docket ON reg_documents (docket_id);CREATE INDEX reg_documents_fr_doc ON reg_documents (fr_doc_num) WHERE fr_doc_num IS NOT NULL;CREATE INDEX reg_documents_comment_open ON reg_documents (comment_end_date) WHERE open_for_comment AND comment_end_date >= CURRENT_DATE;CREATE INDEX reg_documents_last_modified ON reg_documents (last_modified_date DESC);
ALTER TABLE reg_documents ADD COLUMN search_tsv tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(doc_abstract, '')), 'B') || setweight(to_tsvector('english', coalesce(document_type, '')), 'D') ) STORED;CREATE INDEX reg_documents_search ON reg_documents USING gin (search_tsv);
-- ============================================================-- Document attachments (file format index)-- ============================================================
CREATE TABLE reg_document_attachments ( id bigserial PRIMARY KEY, reg_document_id text NOT NULL REFERENCES reg_documents(id) ON DELETE CASCADE, file_format text NOT NULL, -- 'pdf' | 'docx' | 'xlsx' etc. file_url text NOT NULL, file_size_bytes int, fetched_at timestamptz, UNIQUE (reg_document_id, file_url));
-- ============================================================-- (v2) Comments — placeholder; not in v1 ingestion-- ============================================================
-- Schema deferred to v2 — comment text is too large to ingest eagerly.-- We populate `reg_documents.comment_count` for "how many comments are on this docket"-- queries; per-comment text is fetched on demand if/when needed.Schema decisions worth flagging:
reg_documents.fr_doc_numis the cross-source bridge to Federal Register. The upstreamfrDocNumfield on a document (e.g.frDocNum: "2025-25434") is the clean cross-reference — it matchesfr_documents.document_number. We soft-resolve it tofr_document_id. This lets the agent answer "what's the Reg.gov docket for FR 2025-25434" cleanly.comment_countdenormalized withcomment_count_fetched_atto indicate freshness — refresh asynchronously, don't block on it.raw_jsonalways preserved — schema preserves any field we don't model yet.withdrawnflag retained — when a document is withdrawn from regulations.gov (rare), we keep the row withwithdrawn=truerather than deleting it.- Comments deferred — schema shows the intent (
reg_commentstable not yet created in v1). Full comment text is v2. - Attachments separate — multiple file formats per document (PDF + DOCX + XLSX); attachment lookups are common ("download the PDF version").
The full upstream JSON-API response shapes for dockets, documents, and attachments are documented in the GSA API reference; the docketType / documentType enum vocabularies (Rulemaking / Nonrulemaking / Other; Rule / Proposed Rule / Supporting & Related Material / Other) are upstream-defined there too.
Download / update strategy
Section titled “Download / update strategy”Backfill
Section titled “Backfill”The 1.5M+ docket history is too much to backfill entirely on first launch. Strategy:
- Backfill last 3 years of dockets:
?filter[lastModifiedDate][ge]=2023-01-01&page[size]=250&sort=lastModifiedDate. Paginate. ~100K-500K dockets. - For each docket, fetch detail. ~500K calls at 1,000/hr = ~500 hours. Lean: parallelize with multiple keys, or accept ~21 days backfill and run during off-hours.
- Cross-reference with Federal Register: any FR doc with
regulations_dot_gov_info.docket_idpopulated triggers a docket fetch if we don't have it. This catches important historical dockets even if outside our 3-year window. - Document-level fetch piggybacks on docket fetch by listing all documents in the docket.
Daily incremental
Section titled “Daily incremental”- Hourly:
/dockets?filter[lastModifiedDate][ge]={last_run}&sort=-lastModifiedDate. Paginate. - Hourly:
/documents?filter[lastModifiedDate][ge]={last_run}&sort=-lastModifiedDate. Paginate. - Daily: comment-count refresh for dockets/documents with
open_for_comment=true. Hit/documents/{id}to refreshcommentEndDateand check the comment count tally.
State tracking
Section titled “State tracking”Source key: reg_dockets, reg_documents. State stores per-source last_modified_date watermark.
Failure modes
Section titled “Failure modes”- Rate limit (429) from api.data.gov's api-umbrella gateway — throttling surfaces as HTTP 429 with
x-ratelimit-*headers andRetry-After(the gateway is api-umbrella / ApacheTrafficServer, not Cloudflare). RespectRetry-After, then back off exponentially with jitter (e.g. 5s → 30s → 5min → halt after ~5 retries). DEMO_KEY hits the cap fast; production needs a real api.data.gov key. - Docket withdrawn / merged — rare but possible. The
idmay persist while attributes change. Update in place vialastModifiedDate. - Document withdrawn —
withdrawn=trueflag. Preserve row. - fr_doc_num resolution fails — fr_documents may not yet have the row (FR loader may be behind). Soft FK; resolve later.
- Comment count rapidly changing during open period — refresh hourly during the last week before close; once-daily during the rest.
Open questions
Section titled “Open questions”These don't block ingestion but should be resolved before this source is "shipped":
- Real api.data.gov key vs DEMO_KEY. DEMO_KEY won't sustain ingestion. Provision a real key early.
- Backfill scope. 3 years is a guess. Decide based on actual user need and the 21-day backfill cost.
- Comment text v2 path. When we add comments in v2: per-comment metadata is small but text + attachments are huge. Plan: index per-comment metadata, store text only on-demand. Some agencies see >1M comments per high-profile docket.
- Agency ID normalization. Regulations.gov uses agency codes (
EPA,NHTSA) that may or may not match Federal Register's slugs (environmental-protection-agency). Build a small crosswalk. - API stability. The api.data.gov api-umbrella gateway throttles bursty callers with HTTP 429 (
x-ratelimit-*+Retry-After). Build robust retry + circuit-breaker logic keyed on 429 that honorsRetry-Afterbefore backing off. - Closing-soon notifications. A high-value workflow ("alert me when this comment period closes in 3 days") relies on this source. Schema supports it via the index on
comment_end_date WHERE open_for_comment. - Withdrawn documents in citation graph. When a document is withdrawn, downstream FR docs / CFR sections that cited it are still pointing at a valid (withdrawn) record. Schema preserves; consumers must filter.
- Comment counts vs comment count of attachments. Some dockets carry comments in batches as
Mass Commentdocuments — a single docket entry that aggregates thousands of identical or near-identical comments. Surface this as a separatemass_comment_countfield if useful.