Public docs site on Astro Starlight at docs.usejosh.com
Header
Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.
Why
The current public docs (~50 hand-authored HTML files under docs/)
carry a custom design system, hand-maintained sidebar, and a hand-
authored breadcrumb pattern. That's fine for an internal-only working
set but it's a contributor tax once the repo is public — every doc
edit means hand-touching the HTML, the nav, the breadcrumbs, and
the design tokens. The OSS launch needs a docs surface where a
contributor can edit a single .mdx file, push a PR, and have the
navigation/styling regenerate.
Astro Starlight (MIT, free, fully self-hosted-static) is the
recommended target: ships a fast static site, has built-in search,
dark mode, sidebar/breadcrumb generation, and an OSS-feels visual
identity that doesn't fight Josh's own design tokens. Output is
pure HTML/CSS/JS, deployable on Cloudflare Pages / GitHub Pages /
any static host. No vendor lock-in, no proprietary editor, no
hosted-only dependency.
Scope deliberately excludes the spec system itself — docs/spec/
stays as a YAML→HTML generator (the agent-readable structure is
load-bearing for the substrate's tooling). Starlight serves the
generated spec HTML as a static section under /spec/; the YAML
workflow doesn't change.
Per-source pages get shrunk in the same pass: today's pages
duplicate upstream API documentation. Post-migration, each per-
source page is a Josh-specific description (schema, chunker choice,
citation IDs, status) that links out to upstream sources for
endpoint shape and field definitions. Reduces drift; reduces
maintenance load.
User stories
As an OSS contributor making my first docs PR, I want to edit a single MDX file and have the navigation, breadcrumb, and styling regenerate so that I'm not blocked on understanding the design system to fix a typo.
As a buyer evaluating Josh, I want docs.usejosh.com to load fast, search well, and look professional so that I take the OSS project seriously instead of bouncing.
As a future maintainer of the docs, I want per-source pages that link out to upstream API docs rather than reproducing them so that upstream schema changes don't silently drift our docs out of date.
Acceptance criteria (EARS)
- When a contributor edits a single `.mdx` file under `docs-site/src/content/docs/` and runs `npm run build`, the system shall regenerate the static site with the contributor's change visible at the matching URL — no separate nav file or breadcrumb edit required.
- When the static build runs, the system shall produce only static HTML/CSS/JS — no server-side runtime, no Node process required at serve time.
- When `docs.usejosh.com` loads in a cold browser, the system shall serve the homepage in under 1 second on a typical broadband connection (lighthouse performance ≥ 90).
- While the spec system regenerates from YAML via `bin/build-spec.py`, the system shall continue to write HTML under `docs/spec/` (or its post-migration equivalent) and Starlight shall serve those files as a static section without re-rendering them through MDX.
- When a per-source page is migrated, the system shall retain Josh-specific content (schema, chunker, citation IDs, status, history) and replace upstream-duplicated content (endpoint shapes, field definitions, raw URL patterns) with outbound links to the upstream API documentation.
- Where the existing docs reference a relative file path that has moved during migration, the system shall either (a) preserve the old URL via a `_redirects` rule, or (b) update the reference to the new URL — broken internal links shall not ship to production.
- When the OSS public launch ships, the docs site shall be live at docs.usejosh.com with HTTPS, the existing top-level information (overview, data sources index, data status, operations, spec, sources) reachable from the sidebar.
Success determiner
Checklist
- docs.usejosh.com loads over HTTPS and serves a Starlight-styled homepage
- Sidebar matches the existing nav structure (Get started / Spec / Operations / Data sources)
- At least 3 per-source pages migrated and shrunk to link-out style
- Spec catalog reachable under /spec/ and renders the YAML-generated HTML correctly
- Lighthouse performance ≥ 90 on a cold-cache desktop run against the homepage
- No broken internal links on any migrated page (linkcheck pass)
Determiner becomes a `bash` kind once a `bin/docs-linkcheck.sh` or similar lands. Keeping it `manual` while the migration is in flight so the determiner doesn't false-fail on a partial migration.
Clarifications needed
- RESOLVED (2026-05-29): Project lives at `docs-site/` as a sibling to `docs/`; `docs/` stays editable through the migration and is removed in the final cutover PR.
- RESOLVED (2026-05-29): Design system is bridged without forking the palette — import `design/josh-design-system/project/colors_and_type.css` verbatim, then add one bridge layer remapping Starlight `--sl-*` vars to `--josh-*` (registered via `customCss`). Accent stays monochrome `--josh-ink` (not a hue); ship light-only (Josh has no dark tokens) and hide the theme switcher.
- RESOLVED (2026-05-29): Deploy target is Cloudflare Pages — free static host, fast CDN, and native `_redirects` support (load-bearing for the legacy `.html` -> clean-route map). Not the OVHcloud box (single 32 GB substrate host; public docs traffic shouldn't share it).
- RESOLVED (2026-05-29): Search is Pagefind (Starlight default) PLUS a post-build Pagefind CLI pass over the built output, so the 72 verbatim `/spec/` pages get indexed too — otherwise spec search silently breaks.
- RESOLVED (2026-05-29): Per-source shrink is PER-PAGE judgment, not a uniform %. The audit measured ~33% avg, bimodal: API-backed pages shrink 35-52%; pages with no upstream to defer to (staff-directories, topic-taxonomy) are demote-not-delete exceptions at ~12%. The per-page KEEP/CUT contract is the migration map's source-shrink table (see plan).
- STILL OPEN: Fate of the in-browser spec editor (`spec.js`). Default is keep-degraded via the `public/` passthrough (preserves read + download-edit at zero cost); a full port to an Astro island with File System Access write-back is a later, separate decision.
- STILL OPEN (low): Font delivery — self-host via `@fontsource` (removes a render-blocking third-party `fonts.googleapis.com` call) once Source Serif 4's `opsz 8..60` axis + exact weight lists are confirmed reproducible; otherwise keep the Google Fonts `@import`.
Out of scope
- Migrating the spec YAML→HTML generator. The YAML format and `bin/build-spec.py` stay; Starlight serves the generated HTML as static files.
- Migrating private docs (`private/`). The private tree stays hand-authored — different audience, different workflow, scrubbed at OSS launch.
- Building a CMS or in-browser editor for the docs. Contributors edit MDX in their editor; that's the whole flow.
- Translating docs to other languages. English-only for launch.
- Adding interactive embeds (live API queries, etc.) in the docs. Static-first; interactive demos belong in their own spec.
Dependencies
Plan
## Repo layout during migration
- docs/ — existing hand-authored HTML. Stays editable through the
migration so the working substrate isn't blocked. Removed at the
end of the migration in a single PR.
- docs-site/ — new Astro Starlight project. Owned by the migration
spec; gradually fills as pages are converted.
- docs/spec/ — generated from docs/spec/data/*.yaml by
bin/build-spec.py. Unchanged. Starlight reads the generated
HTML as a static section.
## Migration order (per-source pages are the bulk of work)
1. Bootstrap the Starlight project under docs-site/ (init, theme,
sidebar structure mapping the existing nav).
2. Migrate top-level pages: index, josh-data-sources,
data-status. These are the most-trafficked.
3. Migrate operations/ (~13 pages). These are agent-readable and
edit-heavy — biggest contributor-experience win.
4. Migrate sources/ (~19 pages) WITH the shrink-and-link-out pass.
This is the bulk of the work and where the per-source pages get
their content reshaped, not just reformatted.
5. Wire docs/spec/ (generated) into the Starlight build as a
static section under /spec/. Verify bin/build-spec.py output
still works without modification.
6. Set up the deploy pipeline (Cloudflare Pages / chosen target),
point docs.usejosh.com at it.
7. Linkcheck pass; redirect rules for any URLs that moved.
8. Remove docs/ in a final cleanup PR (after a week of running
docs.usejosh.com against the new site to catch any issues).
## Shrink-and-link-out rules for per-source pages
KEEP in the Josh page:
- Schema (the columns we land + their meaning in Josh's namespace)
- Chunker choice + eval status
- Citation ID format
- Status (shipped / verified / planned)
- Probe findings (anti-bot walls, auth requirements, rate limits we
hit in practice)
- Source-specific quirks (e.g., FR XML HD-driven sections, USLM 2.0)
REPLACE with outbound links to upstream docs:
- Raw endpoint URLs (e.g., https://api.congress.gov/v3/bill/...)
- Field-by-field schema documentation that's already in the
upstream API reference
- Rate-limit tables that the upstream changes without notice
- Authentication flows beyond "register here, set this env var"
## On staying compatible during the migration
- The current bin/sync-nav.py workflow stays valid for docs/
during the migration; it becomes a no-op once docs/ is removed.
- The CI nav check (uv run poe ci) still runs against the old tree
until cutover.
- Internal references from tracked code (CLAUDE.md, spec YAMLs,
etc.) need to be updated in the same PR that removes docs/.
## /spec/ integration is a public/ passthrough (audit finding)
"build-spec.py unchanged + Starlight serves /spec/ statically" is
achievable ONLY as a verbatim public/ passthrough. The 72 generated
pages hardcode ../_assets/... and ../../design/... relative paths
and flat <id>.html inter-spec links. So: a prebuild step runsbin/build-spec.py, then mirrors docs/spec/ -> docs-site/public/spec/,docs/_assets/* -> public/_assets/*, and the design CSS ->public/design/.../colors_and_type.css to preserve the exact relative
depth. Astro copies public/ byte-for-byte (no MDX re-render).
Consequence: a BIFURCATED URL scheme — clean /operations/<page>/ and/sources/<page>/ (Starlight) vs flat /spec/<id>.html (verbatim).
Accepted. Giving spec pages Starlight chrome (clean URLs, owned<head>) would require rewriting build-spec.py's HTML emitter AND
re-platforming the spec.js editor — a separate project, out of scope.
The spec.js editor degrades to its download-YAML fallback on a static
host, which is fine for read-only public docs.
Build ordering becomes the new CI gate: astro build (+ astro check)
replaces nav-check; bin/sync-nav.py and docs/_assets/docs-nav.js
are deleted (Starlight owns the sidebar, breadcrumb, current-page
highlight, and mobile drawer from one sidebar config block).
## Risk register (from the 2026-05-29 source audit)
HIGH:
- Anchor-slug drift. Starlight auto-slugs headings; hand-set ids
(double-dash forms like endpoints--url-patterns, #storage-stack,
#model-choice) won't reproduce, silently breaking inbound deep-links
from CLAUDE.md + ~70 spec YAMLs + the data-status -> spec-ingester
links. Mitigation: author MDX headings to slug identically OR set
explicit anchor ids on every KEEP heading; linkcheck (t8) MUST
validate URL FRAGMENTS, not just page-level 200s.
- MDX build breakage. Unfenced <RULE>/<mods>/<uscDoc> XML and
{congress}/{YYYY}/vector(1024) braces parse as JSX and break the
build on nearly every source page. Mitigation: fence every
code/XML/SQL/JSON block; decode </>/& to literals
inside fences; add astro check to CI.
- Buried probe findings. Bot-walls, dead feeds, count-caps, and
zero-row-mirror findings sit INSIDE otherwise-cuttable upstream
tables. A mechanical "delete table -> link upstream" pass destroys
Josh's hardest-won, unrecoverable knowledge. Mitigation: lift every
probe finding/quirk into a KEEP callout or schema prose BEFORE
deleting the surrounding table; keep dated observations with dates.
- Cutover atomicity. Removing docs/ must update CLAUDE.md + all ~70
spec-YAML path references in the SAME PR, or the public-bound
schema-of-record points at deleted pages. Mitigation: do removal +
ref-rewrites in one PR (t11); grep every old path before merge; emit
_redirects as a safety net.
MED: Pagefind omits public/ /spec/ pages unless a post-build CLI
pass indexes them (resolved: run the pass); the private/index.html
sidebar must NEVER be encoded into tracked astro.config.mjs (would
leak private page titles — keep private/ on the legacy mechanism);
DDL on several "exploring" source pages is hand-authored doc while
migrations win over docs — reconcile againstshared/josh_substrate/.../migrations/versions/ rather than trusting
the page. Also fix the PREEXISTING broken anchoreval-architecture.html -> repo-structure.html#launch-repo-split
(target is #public-repos) during conversion.
## Per-source shrink contract
The per-page KEEP/CUT table (shrink %, primary keep, primary cut,
watch-outs for all 19 source pages) is the migration contract for
task t5. KEEP: Postgres DDL + column meaning in Josh's namespace,
chunker/eval status, citation ID format, status, probe findings,
source-specific quirks. CUT -> outbound link: raw endpoint URLs,
field-by-field upstream schema, rate-limit tables, auth flows beyond
"register here, set this env var". staff-directories and topic-taxonomy
are demote-not-delete EXCEPTIONS (no canonical upstream to link to).
Tasks
10 of 12 done.
- t1 Bootstrap docs-site/ as an Astro Starlight project with the existing nav structure (Astro 6 + Starlight 0.39; sidebar config replaces sync-nav.py; remaining nav groups fill as pages land). Needs Node >=22.12 — pinned via docs-site/.node-version
- t2 Map Josh design tokens (colors, type) into Starlight's theme layer; verify visual consistency. Done: src/styles/josh-tokens.css (verbatim import, no fork) + josh-bridge.css (--sl-* -> --josh-*, monochrome-ink accent, light-only, serif H1/H2). Visual confirmed against architecture page + spec catalog
- t3 Migrate top-level pages to Starlight: index.mdx (Josh hub with CardGrid), josh-data-sources.md, data-status.md, plus sources/index.md landing and a contributing.md 'editing the docs' page
- t4 Migrated all 14 docs/operations/* pages to .md — hand-set anchors preserved via remark-heading-id {#id}; code fenced; cross-links rewritten. query-flows + query-coverage are faithful HTML ports keeping their page-local <style> blocks (diagram CSS is page-scoped)
- t5 Migrated all 19 docs/sources/* pages WITH the shrink-and-link-out pass (avg ~33%) — probe findings/quirks lifted out of cuttable tables first; upstream-duplicated reference replaced with outbound links; staff-directories + topic-taxonomy kept as low-shrink exceptions
- t6 Wire docs/spec/ as a VERBATIM public/ passthrough: prebuild (docs-site/scripts/sync-static.mjs) runs bin/build-spec.py then mirrors docs/spec/ + docs/_assets/* + the design CSS into docs-site/public/ (preserving relative depth so ../../design + ../_assets clamp correctly); /spec/<id>.html URLs kept; build-spec.py output UNCHANGED (CI still green). Pagefind indexes all 72 /spec/ pages (75 HTML total). Verified: spec catalog renders fully styled
- t7 Deploy pipeline: Cloudflare Pages building from main (+ a _redirects file), docs.usejosh.com DNS pointed at it
- t8 Linkcheck (docs-site/scripts/linkcheck.mjs) with FRAGMENT validation — GREEN across 111 pages, 0 broken links/anchors. Fixed the preexisting eval-architecture -> repo-structure#launch-repo-split anchor (now #public-repos). Caught + fixed a SmartyPants bug mangling `--` anchor ids (smartypants now off). Outbound upstream-link live-verification (bot-walled hosts as expected-403 w/ browser UA) deferred to pre-launch.
- t9 Generated docs-site/public/_redirects (88 rules): legacy /docs/<path>.html -> clean routes, bare /<path>.html -> clean routes (fixes the /spec passthrough pages' embedded old-nav links), /docs/spec/* -> /spec/:splat. /spec/*.html kept verbatim
- t10 Lighthouse performance ≥ 90 on the homepage on a cold-cache desktop run
- t11 Cutover (cleanup) done in the working tree: removed the migrated hand-authored HTML (docs/index.html, josh-data-sources.html, data-status.html, operations/, sources/ — 36 files); KEPT docs/spec/ (generator) + docs/_assets/ (incl docs-nav.js — the /spec passthrough needs them). sync-nav.py now manages only the private/ tree (public is Starlight). Rewrote 143 spec-YAML refs + 13 CLAUDE.md refs to docs.usejosh.com URLs; repointed build-spec's data-status drift lint to data-status.md. repo `poe ci` GREEN. NOTE: the new docs CI gate is the Node `npm run build` + `astro check` + `npm run linkcheck` (separate from the Python `poe ci`); wiring them together is a follow-up. The 'docs.usejosh.com is the canonical surface' clause completes when t7 deploy lands.
- t12 Contributor workflow documented at /contributing/ ('Editing the docs') — edit one Markdown file, push a PR; the spec-section exception; local dev commands; the {#id} anchor + link-out conventions
Changelog
-
2026-05-29T00:00:00Z
draft→plannedPromoted to `planned` after a read-only source audit (25-agent sweep) produced the migration map. Resolved the open clarifications — Cloudflare Pages deploy, Pagefind + post-build /spec index pass, light-only theme, monochrome-ink accent, per-page shrink judgment. Recorded the audit's architectural finding (the `/spec/` section is a verbatim `public/` passthrough -> bifurcated URL scheme, build-spec.py untouched) and the risk register (anchor-slug drift, MDX-fence breakage, buried probe findings, cutover atomicity). `bin/sync-nav.py` + `docs-nav.js` are now slated for deletion at cutover (Starlight owns nav). Tracer bullet (bootstrap + one ops page + /spec passthrough) underway. -
2026-05-29T00:00:00Z
planned→in_progressFull content migration complete + verified. All 14 operations, 19 shrunk source pages, 3 top-level pages, sources index, and a contributing page are live in docs-site/ (Astro Starlight). Build green, `astro check` 0 errors, internal linkcheck (fragment-level) GREEN across 111 pages. Caught two real issues in the build loop: the `{#id}` anchor syntax breaks in MDX (so ops/source pages are `.md` + a remark-heading-id plugin), and SmartyPants was mangling `--` anchor ids (now disabled). `_redirects` map generated. Remaining: t7 deploy (Cloudflare Pages, awaiting auth), t10 lighthouse, t11 cutover (delete docs/ + retire sync-nav.py — gated on a verified live deploy).