Skip to content

LDA filings

Lobbying Disclosure Act filings: who is lobbying whom, on what bills, for which clients, and (for political contributions) where the lobbyists' money is going. Federal lobbying activity must be reported quarterly to both chambers of Congress; political contributions semi-annually. The Lobbying Disclosure Act (1995) and Honest Leadership and Open Government Act (2007) define the disclosure requirements.

For Josh, LDA filings are the primary signal of "who is influencing this legislation." A user looking at a bill should be able to see the disclosed lobbyists who reported lobbying activity on it; a user looking at a Member should be able to see contributions reported to/from PACs the lobbyists represent.

The Office of Public Records (Senate) and Clerk of the House are migrating to a single unified API at lda.gov — the new home consolidates what used to be split across lda.senate.gov/api/v1/ and disclosurespreview.house.gov/lobbyingdisclosure/. The new lda.gov API is already live (probed live during this work), with the same JSON schema as the legacy Senate API. The old API at lda.senate.gov is sunset on 2026-06-30 — the deprecation headers (Deprecation: @1768003199, Sunset: Tue, 30 Jun 2026 23:59:59 GMT, Link: <https://lda.gov/api/v1/>; rel="successor-version") point at lda.gov as the successor-version. The disclosurespreview.house.gov mirror is superseded by the same unified system.

Upstream API reference (endpoints, filters, enum vocabularies, and full response shapes): lda.gov ReDoc v1 (version 1.0.0).

Source nameLDA filings (Senate + House unified at lda.gov)
PublishersOffice of Public Records, U.S. Senate + Clerk of the House — operating a unified system
LicensePublic domain
Coverage1999 – present
Volume~1.94M filings (1999-present); ~50K-80K new filings per year
Storage estimate~3-5 GB API JSON + linked filing PDFs (PDFs lazily fetched)
API base (canonical)https://lda.gov/api/v1/ — the new unified endpoint. Reference: ReDoc v1. Register for a key at https://lda.gov/api/register/.
AuthAnonymous read OK; API key strongly recommended for ingestion (higher throttle)
Stable ID formatFiling: lda:{filing_uuid} e.g. lda:455edc06-55d1-41ed-878e-70a4040f953c
Statusexploring — schema drafted, ingestion not built. lda.gov endpoint live and verified.

Primary: lda.gov API at https://lda.gov/api/v1/. The unified API published by the Senate Office of Public Records / House Clerk. Same JSON schema as the legacy lda.senate.gov/api/v1/. URLs in responses now use lda.gov (not lda.senate.gov). Build directly against this from day one.

Skip: lda.senate.gov/api/v1/ — deprecated, sunset 2026-06-30. The deprecation headers point at lda.gov as the successor-version. No reason to build against the legacy URL.

Skip: disclosurespreview.house.gov — superseded by the unified lda.gov system.

Skip: ProPublica's bulk LDA mirror — derived from the same Senate API. Direct is better.

Skip: OpenSecrets lobbying data — derived + enriched (employer matching, etc.). v1 ingests raw LDA; v2 may layer OpenSecrets crosswalks.

No bot wall — a plain HTTP client gets 200 JSON. A live source-drift sweep (2026-05-29) confirms the former Akamai bot wall is gone:

  • Plain curl with a default, blank, or python-requests User-Agent → 200 OK with normal JSON from gunicorn. Confirmed against https://lda.gov/api/v1/ (returns endpoint inventory) and https://lda.gov/api/v1/filings/?filing_year=2025&page_size=5 (returns full filing JSON, no auth).
  • No JA3/TLS-fingerprint trick is required; UA spoofing is unnecessary.

Plan: use a plain Python HTTP client (requests / httpx) for this source. The 120/min api-keyed throttle is the load-bearing constraint, not bot detection. Fallback note: if 403s from Akamai reappear, fall back to an HTTP/2 client with a browser-like TLS fingerprint (e.g. curl_cffi, tls-client, niquests) or a headed Chromium fetcher — but neither is needed today.

Get an API key. Anonymous's 15/min limit makes ~1.94M filings backfill take ~9 days even at full saturation; with an API key (120/min), ~5 hours of saturated requests. Registration is at https://lda.gov/api/register/ and is free for research / developer use. Auth is via DRF Token in the Authorization: Token <key> header — share via .kamal/secrets. Anonymous: 15 req/min (~900/hr) per IP. API key: 120 req/min (~7,200/hr) per user. The canonical limits live in the upstream ReDoc reference and may drift.

Constants endpoints don't count against the rate limit — fetch the enum endpoints freely (filing types, lobbying activity issues, government entities, countries, states, prefixes, suffixes, contribution item types). Same for raw PDF/HTML at https://lda.gov/filings/public/filing/{uuid}/print/.

The filings list endpoint no longer requires a filter. The 2026-05-29 sweep confirms unfiltered GET /api/v1/filings/ returns 200 (count ~1.94M) and ?page=2 alone paginates without error. We still filter by filing_year for watermarked incremental runs, but it is no longer a hard requirement for pagination.

Caveat: government-entity granularity changes 2021-02-14. Filings posted before that date list government entities at the filing level (an aggregate list); filings after that date have entities broken down per lobbying_activities[] item. The schema accommodates both via the lda_activity_government_entities table and a fallback lda_filings.government_entities_filing_level field if we want both views.

Endpoints, vocabulary, and response shapes

Section titled “Endpoints, vocabulary, and response shapes”

The endpoint inventory (/filings/, /contributions/, /registrants/, /clients/, /lobbyists/, and the /constants/* enum endpoints), the common filter parameters (filing_year, filing_period, filing_type, ordering, page, limit), the enum vocabularies (filing-type codes such as RR/Q1/MM/YE; filing periods; the 3-letter issue codes like BUD/HCR/TAX; filer types), and the full JSON response shapes are all upstream-maintained — served live at the /constants/* endpoints and documented in the lda.gov ReDoc reference. We do not duplicate them here.

Josh-specific interpretive notes on those shapes:

  • income (firms charging clients) and expenses (in-house lobbying) on an LD-2 filing are mutually exclusive — a firm filing reports income; a corporate in-house team reports expenses.
  • On an LD-203 contribution filing, filer_type is lobbyist or registrant; lobbyist is set when filer_type = 'lobbyist'; no_contributions is true on a no-activity report; pacs[] lists PACs the filer contributed to; each contribution_items[] row carries honoree_name (the recipient member or candidate).
  • The embedded registrant carries house_registrant_id, a cross-reference to the House system, and ppb_country (principal place of business).

Filing: lda:{filing_uuid} — UUID-based, opaque.

Examples:

  • lda:455edc06-55d1-41ed-878e-70a4040f953c
  • lda:86260004-84e7-46e3-9cfa-76edae508768

Registrant: lda-reg:{id} (e.g. lda-reg:9181) Client: lda-client:{id} (e.g. lda-client:113256) Lobbyist: lda-lob:{id} (e.g. lda-lob:43217)

These IDs are stable across queries within v1; v2 migration may need an ID-mapping table.

MetricValue
Total filings (1999+)1,946,003 (current API count)
Filings per year~50,000-80,000
Active registrants~3,000-5,000
Active clients~10,000-20,000
Active lobbyists~10,000-15,000

Postgres footprint: ~3-5 GB.

The API supports ?ordering=-dt_posted for newest-first; combined with a dt_posted filter for incremental:

  1. Hourly: /filings/?ordering=-dt_posted&page[size]=100 — fetch first page, paginate until we hit a row already in our table.
  2. Daily: /contributions/?ordering=-dt_posted — same pattern.
  3. Weekly: full reconciliation for current quarter — fetch all rows for current filing_year + filing_period to catch updates we may have missed.
-- ============================================================
-- Registrants (lobbying firms or in-house orgs)
-- ============================================================
CREATE TABLE lda_registrants (
id text PRIMARY KEY, -- 'lda-reg:9181'
registrant_id int NOT NULL UNIQUE,
house_registrant_id int, -- House's parallel ID
name text NOT NULL,
description text,
address_1 text,
address_2 text,
city text,
state text,
zip text,
country text,
ppb_country text, -- principal place of business
contact_name text,
contact_telephone text,
dt_updated timestamptz,
raw_json jsonb,
fetched_at timestamptz NOT NULL,
inserted_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX lda_registrants_name ON lda_registrants USING gin (to_tsvector('english', name));
-- ============================================================
-- Clients (entities being lobbied for)
-- ============================================================
CREATE TABLE lda_clients (
id text PRIMARY KEY, -- 'lda-client:113256'
client_id int NOT NULL UNIQUE,
house_client_id int,
name text NOT NULL,
general_description text,
state text,
country text,
ppb_country text,
is_government_entity boolean,
is_self_select boolean,
dt_updated timestamptz,
raw_json jsonb,
fetched_at timestamptz NOT NULL,
inserted_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX lda_clients_name ON lda_clients USING gin (to_tsvector('english', name));
-- ============================================================
-- Lobbyists (individuals)
-- ============================================================
CREATE TABLE lda_lobbyists (
id text PRIMARY KEY, -- 'lda-lob:43217'
lobbyist_id int NOT NULL UNIQUE,
prefix text,
first_name text,
middle_name text,
last_name text NOT NULL,
nickname text,
suffix text,
raw_json jsonb,
fetched_at timestamptz NOT NULL,
inserted_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX lda_lobbyists_name ON lda_lobbyists (last_name, first_name);
-- ============================================================
-- LD-1 / LD-2 filings (registration + quarterly activity)
-- ============================================================
CREATE TABLE lda_filings (
id text PRIMARY KEY, -- 'lda:{uuid}'
filing_uuid uuid NOT NULL UNIQUE,
filing_type text NOT NULL, -- 'RR', 'Q1', '1A', etc.
filing_type_display text,
filing_year smallint NOT NULL,
filing_period text NOT NULL,
is_amendment boolean NOT NULL DEFAULT false,
is_termination boolean NOT NULL DEFAULT false,
is_no_activity boolean NOT NULL DEFAULT false,
-- Money
income numeric(14,2),
expenses numeric(14,2),
expenses_method text,
-- Parties
registrant_id int REFERENCES lda_registrants(registrant_id),
client_id int REFERENCES lda_clients(client_id),
-- Dates
dt_posted timestamptz NOT NULL,
termination_date date,
-- Source URLs
filing_document_url text,
filing_document_content_type text,
-- Lifecycle
raw_json jsonb NOT NULL,
fetched_at timestamptz NOT NULL,
parsed_at timestamptz,
inserted_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX lda_filings_year_period ON lda_filings (filing_year, filing_period);
CREATE INDEX lda_filings_registrant ON lda_filings (registrant_id);
CREATE INDEX lda_filings_client ON lda_filings (client_id);
CREATE INDEX lda_filings_dt_posted ON lda_filings (dt_posted DESC);
-- ============================================================
-- Lobbying activities (LD-2 detail rows)
-- ============================================================
CREATE TABLE lda_lobbying_activities (
id bigserial PRIMARY KEY,
filing_id text NOT NULL REFERENCES lda_filings(id) ON DELETE CASCADE,
sequence int NOT NULL,
general_issue_code text NOT NULL, -- 'TAX', 'HCR', etc.
general_issue_display text,
description text,
foreign_entity_issues text,
UNIQUE (filing_id, sequence)
);
CREATE INDEX lda_activities_issue ON lda_lobbying_activities (general_issue_code);
ALTER TABLE lda_lobbying_activities ADD COLUMN search_tsv tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(general_issue_display, '') || ' ' || coalesce(general_issue_code, '')), 'B') ||
setweight(to_tsvector('english', coalesce(description, '')), 'D')
) STORED;
CREATE INDEX lda_activities_search ON lda_lobbying_activities USING gin (search_tsv);
-- Lobbyists per activity (M:N)
CREATE TABLE lda_activity_lobbyists (
activity_id bigint NOT NULL REFERENCES lda_lobbying_activities(id) ON DELETE CASCADE,
lobbyist_id int NOT NULL REFERENCES lda_lobbyists(lobbyist_id),
PRIMARY KEY (activity_id, lobbyist_id)
);
-- Government entities lobbied per activity (M:N)
CREATE TABLE lda_activity_government_entities (
activity_id bigint NOT NULL REFERENCES lda_lobbying_activities(id) ON DELETE CASCADE,
entity_id int NOT NULL, -- references the LDA enum
entity_name text NOT NULL,
PRIMARY KEY (activity_id, entity_id)
);
-- ============================================================
-- LD-203 contribution filings
-- ============================================================
CREATE TABLE lda_contribution_filings (
id text PRIMARY KEY, -- 'lda-c:{uuid}'
filing_uuid uuid NOT NULL UNIQUE,
filing_type text NOT NULL, -- 'MM', 'YE', 'MMA', 'YEA'
filing_year smallint NOT NULL,
filing_period text NOT NULL, -- 'mid_year', 'year_end'
filer_type text NOT NULL CHECK (filer_type IN ('lobbyist', 'registrant')),
registrant_id int REFERENCES lda_registrants(registrant_id),
lobbyist_id int REFERENCES lda_lobbyists(lobbyist_id),
no_contributions boolean NOT NULL DEFAULT false,
dt_posted timestamptz NOT NULL,
pacs text[],
raw_json jsonb NOT NULL,
fetched_at timestamptz NOT NULL,
inserted_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX lda_contributions_year ON lda_contribution_filings (filing_year, filing_period);
-- ============================================================
-- Itemized contributions (LD-203 line items)
-- ============================================================
CREATE TABLE lda_contribution_items (
id bigserial PRIMARY KEY,
contribution_filing_id text NOT NULL REFERENCES lda_contribution_filings(id) ON DELETE CASCADE,
sequence int NOT NULL,
contribution_type text, -- 'FECA', 'PRESIDENTIAL_LIBRARY', etc.
contributor_name text,
honoree_name text, -- Member / candidate / official receiving the contribution
honoree_bioguide text REFERENCES legislators(bioguide_id), -- soft FK when resolvable
payee_name text,
contribution_date date,
amount numeric(12,2),
UNIQUE (contribution_filing_id, sequence)
);
CREATE INDEX lda_contribution_items_honoree ON lda_contribution_items (honoree_bioguide) WHERE honoree_bioguide IS NOT NULL;
CREATE INDEX lda_contribution_items_date ON lda_contribution_items (contribution_date DESC);

Schema decisions worth flagging:

  • One filing → many lda_lobbying_activities rows, each capturing one issue area with its lobbyist + government-entity edges. Lets us answer "what was lobbied on TAX in 2025-Q1" cleanly.
  • lda_contribution_items.honoree_bioguide as soft FK — most contributions go to sitting Members; resolution is best-effort (free-text honoree_name → bioguide via fuzzy match against legislators.last_name + first_name, see legislators & committees). Critical for the "PAC contributions to my representative" workflow.
  • raw_json preserved per filing — the v1 → v2 migration may add fields; preserving raw lets us re-parse.
  • No body text on filings — the LD-2 form has structured activity rows; the PDF is the human-readable rendering. The activity rows are the structured signal we ingest.
  • bills cross-reference deferred. LD-2 activity descriptions sometimes mention specific bill numbers ("H.R. 1", "S. 5"). Extraction at parse time is feasible but error-prone (free text). Defer to v1.x; ship v1 with description-text searchable. (See bills for the BILLSTATUS target.)
  1. Iterate by year + period:
    • For year in 1999..2025:
      • For period in [first_quarter, second_quarter, third_quarter, fourth_quarter, mid_year, year_end]:
        • Page through /filings/?filing_year={Y}&filing_period={P}&ordering=dt_posted until exhausted.
        • Same for /contributions/.
  2. ~1.94M filings × per-row detail not always needed (list endpoint returns full shape). Single API call paginates 25-100 rows.
  3. Reference data first: /registrants/, /clients/, /lobbyists/ — small enough to backfill in entirety.

Time estimate (lda.gov, with API key, 120/min sustained):

  • ~1.94M filings ÷ 120 req/min ÷ 60 = ~270 minutes of saturated requests = ~5 hours.
  • ~1.94M contributions same magnitude = ~5 hours.
  • With pagination at default page size (25), more like ~12-24 hours for filings + contributions combined.
  • Reference data (registrants, clients, lobbyists) negligible — minutes.

Without API key (15/min anonymous): roughly 8x slower → ~4 days. Strongly preferred to register and use a key.

  1. Hourly: /filings/?ordering=-dt_posted&filing_year={current}. Stop at last-seen UUID. The filing_year filter scopes the watermarked incremental run (no longer required for pagination — unfiltered pagination now returns 200).
  2. Hourly: /contributions/?ordering=-dt_posted&filing_year={current}.
  3. Weekly: per-quarter reconciliation. Compare row counts against our table.

Source keys: lda_filings, lda_contributions, lda_registrants, lda_clients, lda_lobbyists. State stores last-seen dt_posted watermark per source plus the most recent filing_uuid for tie-breaking.

  • HTTP 429 throttle. API returns Retry-After in seconds — honor it. Don't retry tighter than the recommended wait.
  • HTTP 403 from Akamai (fallback only). Not observed as of the 2026-05-29 sweep — plain HTTP clients get 200 JSON. If 403s reappear, fall back to a TLS-fingerprint-spoofing client (curl_cffi, tls-client, niquests) or a headless-with-stealth Chromium.
  • Pagination without filter. No longer an error — unfiltered /filings/?page=N returns 200 as of the 2026-05-29 sweep. We still pass filing_year for watermarked incremental runs.
  • Government entity granularity for pre-2021-02-14 filings. Fall back to filing-level government entities; flag at ingest so downstream consumers know.
  • Honoree resolution to bioguide — many contributions go to candidates not yet sworn in. Fuzzy match; leave NULL when ambiguous.
  • Filing amendments — new filing_uuid (amendments get distinct UUIDs); they reference the parent filing via filing_type. Capture amendments as separate rows; downstream consumers walk the amendment chain.
  • No-activity reportis_no_activity flag; activities array empty.

These don't block ingestion but should be resolved before this source is "shipped":

  • HTTP client (resolved). No bot wall as of the 2026-05-29 sweep — a plain Python requests / httpx client gets 200 JSON. No TLS-fingerprint trick or headed browser needed. Fallback only if 403s reappear: curl_cffi / tls-client / niquests, or a headed Chromium fetcher.
  • API key acquisition. Free registration at lda.gov/api/register/. One-time setup; share via .kamal/secrets.
  • Bill-number extraction from activity descriptions. A high-value cross-reference: when a filing's lobbying_activities[].description mentions "H.R. 1, the One Big Beautiful Bill Act," we should populate a lda_activity_bills join table linking the activity to the bill_id. The free-text bill-citation field is descriptionnot general_issue_specific, which is null on lda.gov. Lean: ship v1 with description-text search; build the extractor in v1.x.
  • House LDA divergence. HLOGA requires identical filings in both chambers, but historically some filings appear in House but not Senate due to filing system glitches. v1 trusts Senate; v1.x can spot-check House.
  • PAC contribution graph. LD-203 contributions are itemized by PAC. Building a "which PACs are funded by lobbyists for which clients" graph is an analytic layer, not an ingest concern.
  • Foreign Agents (FARA). The LDA covers domestic lobbying. Foreign agent filings under FARA are a separate registration system — listed in v2 deferral table. Distinct schema; document later.
  • Contribution amount ambiguity. Some filings report ranges or "see attached PDF." Schema's numeric(12,2) requires a number; when unparseable, leave NULL and log.
  • Relationship to bills cross-reference (BILLSTATUS). Bills have a policy_area and subjects; LDA filings have issue codes. Build a small crosswalk so "lobbying on HCR (Health Care)" can be cross-referenced with bills tagged "Health" subject.
  • Data quality. LDA filings are notoriously self-reported with limited verification. The description field is free-text and varies wildly. We ingest as-is; downstream consumers handle the noise.
  • Mass amendments. When the OPR requires registrants to re-file (e.g., format change), thousands of amendments hit at once. Plan for burst.