substratedraftp2

Litestream WAL streaming (path not taken)

substrate-litestream-backup · updated 2026-05-10T20:30:00Z

Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.

Continuous WAL replication of /data/josh.db to S3-compatible storage
with point-in-time recovery to any second within the retention window.
Evaluated and indefinitely deferred for the substrate. Preserved as
this spec for documentation of the path-not-taken; not on any roadmap.

Decision: substrate-nightly-backup (restic + DO Spaces cold tier +
whole-/data backup) is the canonical substrate backup approach. No
planned re-evaluation, no "switch when X" trigger.

## Why Litestream loses for the substrate, at any size

Litestream's two structural advantages are:

1. Sub-second RPO via continuous WAL streaming.
2. Point-in-time recovery to any moment within the retention window.

Both advantages only earn their operational complexity (a streaming
daemon co-located with the writer, replica-target lifecycle, restore
paths that replay WAL on top of a base snapshot) when the substrate
carries irreplaceable state — rows that cannot be reconstructed
from any other source. Examples:

- User accounts, passwords, sessions
- Stripe billing rows, payment history, subscription state
- Hand-curated content, manual annotations, comment threads
- Anything a user wrote that doesn't exist anywhere else

Josh's substrate has none of that. It's a cache of public federal
data + AI-derived artifacts:

- Bills, FR docs, USC sections, hearings, votes — all re-fetchable
from upstream APIs.
- Body normalization output, chunks, embeddings — all re-derivable
from raw payloads via the same code that built them the first time.
- Source state watermarks — re-establish from the data itself.

At any substrate size, recovery from 24h-old snapshot is "restore last
night + re-run the day's ingester delta." Annoying, not catastrophic.
Restic's snapshot semantics + restore speed are sufficient.

## What WOULD trigger a future user-state backup spec

Not a re-evaluation of this spec. Step 2 introduces the agent UI +
project/session history (irreplaceable), and Cloud surfaces eventually
add user accounts + Stripe state (irreplaceable). When that lands,
the right answer is a separate spec for "user-state backup with
PITR" scoped to the customer DB (likely a separate SQLite file from
the substrate, or a Postgres alongside it). That spec might use
Litestream, or it might use logical replication, or something else
appropriate for that shape. It's a different problem with different
blast radius — not a substrate concern.

This spec stays as-is, marked draft, as the documented evaluation
result.

No user stories yet.

  1. Path not taken: this spec does not ship and has no roadmap. See `substrate-nightly-backup` (restic + DO Spaces cold tier + whole-/data backup) for the canonical substrate backup approach.
kindmanual

Checklist

  • No determiner — this spec documents an evaluation result, not work to ship. The decision: substrate backup is restic-based, indefinitely. If irreplaceable user state ever lands, that's a separate spec, not a re-activation of this one.

Intentionally `manual` and intentionally an evaluation note rather than an actionable spec.

None.

  • All substrate backup work — handled by `substrate-nightly-backup`.
  • Future user-state backup (cloud admin, Stripe billing, agent project history) — that's a separate spec when those features land, not a re-evaluation of this one.

Not planning to ship. Evaluation summary in the why section.

Reference left here as design-archaeology for any future contributor
asking "did we consider Litestream?" The answer is yes, deliberately,
and the canonical alternative (restic + cold-tier object storage +
whole-/data backup with 30+ day retention) is documented in
substrate-nightly-backup.

No tasks defined.

  • 2026-05-10T20:30:00Z draftdraft Reframed from "deferred, revisit when X" to "indefinitely deferred, not a substrate concern." Decision crystallized: restic + cold-tier DO Spaces + whole-/data backup is the canonical substrate backup approach at any size. Litestream's value (sub-second RPO, PITR) only earns its keep with irreplaceable user state, which the substrate doesn't have and isn't planned to have. If irreplaceable state ever lands (Step 2 agent UI, Cloud admin, Stripe billing), that's a separate "user-state backup" spec — not this one.
  • 2026-05-09T12:00:00Z planneddraft Superseded by `substrate-nightly-backup`. Picked nightly `sqlite3 .backup` snapshot over Litestream WAL streaming for v1 substrate — substrate is mostly regenerable from public APIs, so 24h RPO is acceptable, and the operational simplicity wins. Revisit when irreplaceable user/billing state lands.

docs/spec/substrate-litestream-backup.html · generated by bin/build-spec.py