Skip to content

Regulations.gov dockets

When a federal agency proposes or finalizes a regulation, it opens a docket on regulations.gov for public comment. Each docket contains the regulatory document(s), supporting analyses, and (after the comment period) the public comments submitted. The docket is the lifecycle container for a single rulemaking.

For Josh, regulations.gov dockets give us:

  • The comment count and deadline per active rulemaking — high-signal for "what's open for comment now."
  • The mapping from a Federal Register document to its docket_id and any related supporting documents.
  • The list of public comments (metadata only at v1; full comment text deferred to v2).

This source is adjacent to Federal Register — every FR rule cites a docket_id, and many dockets house multiple FR documents over their lifecycle. The relationship is M:N: a docket may contain multiple FR documents (proposed rule + final rule + correction); an FR document is in exactly one docket.

Source nameRegulations.gov dockets (metadata + document list)
PublisherGeneral Services Administration (regulations.gov is operated by GSA on behalf of agencies)
LicensePublic domain
CoverageAll open and historical federal rulemakings, ~2003 – present
Volume~1.5M+ dockets all-time; ~10K-30K active at any time. ~50K-100K new documents per year.
Storage estimate~1-2 GB metadata; full comment text would be 100s of GB to TB (deferred to v2).
Accessregulations.gov v4 REST API, JSON, base https://api.regulations.gov/v4/. Requires X-Api-Key. See Access notes.
Stable ID formatDocket: docket:{docketId} e.g. docket:EPA-HQ-OAR-2025-0001. Document: regdoc:{documentId}.
Statusexploring — schema drafted, ingestion not built

Primary: regulations.gov v4 REST API. The canonical source. Direct from GSA. JSON. Filterable by date and agency. See the GSA API reference for endpoint routes, query params, and ?include=attachments.

Secondary: Federal Register API embedded regulations_dot_gov_info field (already documented in Federal Register). Carries comments_count, comment_url, docket_id, document_id for FR docs that exist in regulations.gov. This is redundant with the regulations.gov API but lower-cost (no key needed). Use as enrichment for FR docs we already have.

Skip: scraping regulations.gov website. API is the right path; scraping is fragile and adds nothing.

Skip: comment text in v1. Per the data sources index v2 deferral list — full comment text is hundreds of GB to TB. We ingest comment counts and deadlines per docket; per-comment-text fetch is on-demand only.

An X-Api-Key header is required for every request. Register for a production key at api.data.gov/signup and set it as the API key env var. DEMO_KEY works for testing but rate-limits aggressively — observed at ~30/hour in practice, which won't sustain ingestion.

The API is fronted by api.data.gov's api-umbrella gateway, not Cloudflare — live response headers show via: api-umbrella (ApacheTrafficServer), x-api-umbrella-request-id, x-vcap-request-id, and x-ratelimit-* (no Cloudflare markers). Throttling surfaces as HTTP 429 with x-ratelimit-* headers and Retry-After. Plan defensive retry: respect Retry-After, then exponential backoff with jitter keyed on 429. See the upstream api.data.gov rate limits for current default and per-endpoint caps.

The commenting subsystem (/comments/{commentId}) is rate-limited more strictly than dockets/documents. We don't ingest comment text in v1, so this matters less.

Docket: docket:{docketId}, where docketId is the agency-assigned identifier verbatim.

Examples:

  • docket:EPA-HQ-OAR-2025-0001
  • docket:FAA-2024-1234
  • docket:NHTSA-2025-0050

The format is {AGENCY}-{ORG}-{TYPE}-{YEAR}-{SEQUENCE} but varies by agency. Treat as opaque string.

Document: regdoc:{documentId}, where documentId is regulations.gov's identifier (often {docketId}-NNNN form).

The Federal Register document number is separate from regulations.gov's documentId. fr_documents.document_number (e.g. 2026-08558) is FR-side; regdoc:{documentId} (e.g. EPA-HQ-OAR-2025-0001-0123) is regulations.gov-side. They're linked via cross-reference fields.

MetricValue
Dockets total (all-time)~1.5M+
Active dockets (open for comment)~10K-30K at any time
Documents per year~50K-100K
Comments per yeartens of millions (v2 deferral)
Per-docket metadata~5-10 KB JSON
Per-document metadata~5-10 KB JSON

Postgres footprint: ~1-2 GB metadata only.

  1. Hourly: /dockets?filter[lastModifiedDate][ge]={last_run}&page[size]=250&sort=-lastModifiedDate — paginate, fetch new dockets.
  2. Hourly: /documents?filter[lastModifiedDate][ge]={last_run}&page[size]=250&sort=-lastModifiedDate — paginate, fetch new documents.
  3. Per docket with openForComment=true: check commentEndDate against today; mark closing_soon for ones within 7 days.
  4. Cross-trigger from Federal Register loader: when an FR doc's regulations_dot_gov_info.docket_id is populated, schedule a docket fetch if we don't already have it.

The API rate limit is the binding constraint. With 1,000/hour and ~50K-100K new docs/year, the daily volume is manageable but burst-sensitive.

-- ============================================================
-- Dockets
-- ============================================================
CREATE TABLE reg_dockets (
id text PRIMARY KEY, -- 'docket:EPA-HQ-OAR-2025-0001'
docket_id text NOT NULL UNIQUE, -- 'EPA-HQ-OAR-2025-0001'
agency_id text, -- 'EPA' (regs.gov code; differs from FR slug)
docket_type text, -- 'Rulemaking' | 'Nonrulemaking' | 'Other'
sub_type text,
sub_type_2 text,
category text,
rin text, -- '2060-AV12'
title text,
short_title text,
docket_abstract text,
last_modified_date timestamptz,
-- Lifecycle
raw_json jsonb NOT NULL,
fetched_at timestamptz NOT NULL,
parsed_at timestamptz,
inserted_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX reg_dockets_last_modified ON reg_dockets (last_modified_date DESC);
CREATE INDEX reg_dockets_agency ON reg_dockets (agency_id, last_modified_date DESC);
CREATE INDEX reg_dockets_rin ON reg_dockets (rin) WHERE rin IS NOT NULL;
ALTER TABLE reg_dockets ADD COLUMN search_tsv tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '') || ' ' || coalesce(short_title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(docket_abstract, '')), 'B') ||
setweight(to_tsvector('english', coalesce(category, '')), 'C')
) STORED;
CREATE INDEX reg_dockets_search ON reg_dockets USING gin (search_tsv);
-- ============================================================
-- Documents (within dockets)
-- ============================================================
CREATE TABLE reg_documents (
id text PRIMARY KEY, -- 'regdoc:EPA-HQ-OAR-2025-0001-0001'
document_id text NOT NULL UNIQUE, -- 'EPA-HQ-OAR-2025-0001-0001'
docket_id text NOT NULL, -- 'EPA-HQ-OAR-2025-0001'
docket_row_id text REFERENCES reg_dockets(id) ON DELETE CASCADE,
agency_id text,
document_type text NOT NULL, -- 'Rule' | 'Proposed Rule' | 'Supporting & Related Material' | 'Other'
subtype text,
title text,
doc_abstract text,
-- Federal Register cross-reference
fr_doc_num text, -- '2025-25434' — joins to fr_documents.document_number
fr_document_id text REFERENCES fr_documents(id), -- soft FK ('fr:2025-25434')
-- Comment window
posted_date timestamptz,
comment_start_date timestamptz,
comment_end_date timestamptz,
open_for_comment boolean NOT NULL DEFAULT false,
-- Status
withdrawn boolean NOT NULL DEFAULT false,
last_modified_date timestamptz,
-- Comment counts (denormalized; refresh periodically)
comment_count int,
comment_count_fetched_at timestamptz,
-- Lifecycle
raw_json jsonb NOT NULL,
fetched_at timestamptz NOT NULL,
parsed_at timestamptz,
inserted_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX reg_documents_docket ON reg_documents (docket_id);
CREATE INDEX reg_documents_fr_doc ON reg_documents (fr_doc_num) WHERE fr_doc_num IS NOT NULL;
CREATE INDEX reg_documents_comment_open ON reg_documents (comment_end_date)
WHERE open_for_comment AND comment_end_date >= CURRENT_DATE;
CREATE INDEX reg_documents_last_modified ON reg_documents (last_modified_date DESC);
ALTER TABLE reg_documents ADD COLUMN search_tsv tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(doc_abstract, '')), 'B') ||
setweight(to_tsvector('english', coalesce(document_type, '')), 'D')
) STORED;
CREATE INDEX reg_documents_search ON reg_documents USING gin (search_tsv);
-- ============================================================
-- Document attachments (file format index)
-- ============================================================
CREATE TABLE reg_document_attachments (
id bigserial PRIMARY KEY,
reg_document_id text NOT NULL REFERENCES reg_documents(id) ON DELETE CASCADE,
file_format text NOT NULL, -- 'pdf' | 'docx' | 'xlsx' etc.
file_url text NOT NULL,
file_size_bytes int,
fetched_at timestamptz,
UNIQUE (reg_document_id, file_url)
);
-- ============================================================
-- (v2) Comments — placeholder; not in v1 ingestion
-- ============================================================
-- Schema deferred to v2 — comment text is too large to ingest eagerly.
-- We populate `reg_documents.comment_count` for "how many comments are on this docket"
-- queries; per-comment text is fetched on demand if/when needed.

Schema decisions worth flagging:

  • reg_documents.fr_doc_num is the cross-source bridge to Federal Register. The upstream frDocNum field on a document (e.g. frDocNum: "2025-25434") is the clean cross-reference — it matches fr_documents.document_number. We soft-resolve it to fr_document_id. This lets the agent answer "what's the Reg.gov docket for FR 2025-25434" cleanly.
  • comment_count denormalized with comment_count_fetched_at to indicate freshness — refresh asynchronously, don't block on it.
  • raw_json always preserved — schema preserves any field we don't model yet.
  • withdrawn flag retained — when a document is withdrawn from regulations.gov (rare), we keep the row with withdrawn=true rather than deleting it.
  • Comments deferred — schema shows the intent (reg_comments table not yet created in v1). Full comment text is v2.
  • Attachments separate — multiple file formats per document (PDF + DOCX + XLSX); attachment lookups are common ("download the PDF version").

The full upstream JSON-API response shapes for dockets, documents, and attachments are documented in the GSA API reference; the docketType / documentType enum vocabularies (Rulemaking / Nonrulemaking / Other; Rule / Proposed Rule / Supporting & Related Material / Other) are upstream-defined there too.

The 1.5M+ docket history is too much to backfill entirely on first launch. Strategy:

  1. Backfill last 3 years of dockets: ?filter[lastModifiedDate][ge]=2023-01-01&page[size]=250&sort=lastModifiedDate. Paginate. ~100K-500K dockets.
  2. For each docket, fetch detail. ~500K calls at 1,000/hr = ~500 hours. Lean: parallelize with multiple keys, or accept ~21 days backfill and run during off-hours.
  3. Cross-reference with Federal Register: any FR doc with regulations_dot_gov_info.docket_id populated triggers a docket fetch if we don't have it. This catches important historical dockets even if outside our 3-year window.
  4. Document-level fetch piggybacks on docket fetch by listing all documents in the docket.
  1. Hourly: /dockets?filter[lastModifiedDate][ge]={last_run}&sort=-lastModifiedDate. Paginate.
  2. Hourly: /documents?filter[lastModifiedDate][ge]={last_run}&sort=-lastModifiedDate. Paginate.
  3. Daily: comment-count refresh for dockets/documents with open_for_comment=true. Hit /documents/{id} to refresh commentEndDate and check the comment count tally.

Source key: reg_dockets, reg_documents. State stores per-source last_modified_date watermark.

  • Rate limit (429) from api.data.gov's api-umbrella gateway — throttling surfaces as HTTP 429 with x-ratelimit-* headers and Retry-After (the gateway is api-umbrella / ApacheTrafficServer, not Cloudflare). Respect Retry-After, then back off exponentially with jitter (e.g. 5s → 30s → 5min → halt after ~5 retries). DEMO_KEY hits the cap fast; production needs a real api.data.gov key.
  • Docket withdrawn / merged — rare but possible. The id may persist while attributes change. Update in place via lastModifiedDate.
  • Document withdrawnwithdrawn=true flag. Preserve row.
  • fr_doc_num resolution fails — fr_documents may not yet have the row (FR loader may be behind). Soft FK; resolve later.
  • Comment count rapidly changing during open period — refresh hourly during the last week before close; once-daily during the rest.

These don't block ingestion but should be resolved before this source is "shipped":

  • Real api.data.gov key vs DEMO_KEY. DEMO_KEY won't sustain ingestion. Provision a real key early.
  • Backfill scope. 3 years is a guess. Decide based on actual user need and the 21-day backfill cost.
  • Comment text v2 path. When we add comments in v2: per-comment metadata is small but text + attachments are huge. Plan: index per-comment metadata, store text only on-demand. Some agencies see >1M comments per high-profile docket.
  • Agency ID normalization. Regulations.gov uses agency codes (EPA, NHTSA) that may or may not match Federal Register's slugs (environmental-protection-agency). Build a small crosswalk.
  • API stability. The api.data.gov api-umbrella gateway throttles bursty callers with HTTP 429 (x-ratelimit-* + Retry-After). Build robust retry + circuit-breaker logic keyed on 429 that honors Retry-After before backing off.
  • Closing-soon notifications. A high-value workflow ("alert me when this comment period closes in 3 days") relies on this source. Schema supports it via the index on comment_end_date WHERE open_for_comment.
  • Withdrawn documents in citation graph. When a document is withdrawn, downstream FR docs / CFR sections that cited it are still pointing at a valid (withdrawn) record. Schema preserves; consumers must filter.
  • Comment counts vs comment count of attachments. Some dockets carry comments in batches as Mass Comment documents — a single docket entry that aggregates thousands of identical or near-identical comments. Surface this as a separate mass_comment_count field if useful.