Litestream WAL streaming (path not taken)
Header
Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.
Why
Continuous WAL replication of /data/josh.db to S3-compatible storage
with point-in-time recovery to any second within the retention window.
Evaluated and indefinitely deferred for the substrate. Preserved as
this spec for documentation of the path-not-taken; not on any roadmap.
Decision: substrate-nightly-backup (restic + DO Spaces cold tier +
whole-/data backup) is the canonical substrate backup approach. No
planned re-evaluation, no "switch when X" trigger.
## Why Litestream loses for the substrate, at any size
Litestream's two structural advantages are:
1. Sub-second RPO via continuous WAL streaming.
2. Point-in-time recovery to any moment within the retention window.
Both advantages only earn their operational complexity (a streaming
daemon co-located with the writer, replica-target lifecycle, restore
paths that replay WAL on top of a base snapshot) when the substrate
carries irreplaceable state — rows that cannot be reconstructed
from any other source. Examples:
- User accounts, passwords, sessions
- Stripe billing rows, payment history, subscription state
- Hand-curated content, manual annotations, comment threads
- Anything a user wrote that doesn't exist anywhere else
Josh's substrate has none of that. It's a cache of public federal
data + AI-derived artifacts:
- Bills, FR docs, USC sections, hearings, votes — all re-fetchable
from upstream APIs.
- Body normalization output, chunks, embeddings — all re-derivable
from raw payloads via the same code that built them the first time.
- Source state watermarks — re-establish from the data itself.
At any substrate size, recovery from 24h-old snapshot is "restore last
night + re-run the day's ingester delta." Annoying, not catastrophic.
Restic's snapshot semantics + restore speed are sufficient.
## What WOULD trigger a future user-state backup spec
Not a re-evaluation of this spec. Step 2 introduces the agent UI +
project/session history (irreplaceable), and Cloud surfaces eventually
add user accounts + Stripe state (irreplaceable). When that lands,
the right answer is a separate spec for "user-state backup with
PITR" scoped to the customer DB (likely a separate SQLite file from
the substrate, or a Postgres alongside it). That spec might use
Litestream, or it might use logical replication, or something else
appropriate for that shape. It's a different problem with different
blast radius — not a substrate concern.
This spec stays as-is, marked draft, as the documented evaluation
result.
User stories
No user stories yet.
Acceptance criteria (EARS)
- Path not taken: this spec does not ship and has no roadmap. See `substrate-nightly-backup` (restic + DO Spaces cold tier + whole-/data backup) for the canonical substrate backup approach.
Success determiner
Checklist
- No determiner — this spec documents an evaluation result, not work to ship. The decision: substrate backup is restic-based, indefinitely. If irreplaceable user state ever lands, that's a separate spec, not a re-activation of this one.
Intentionally `manual` and intentionally an evaluation note rather than an actionable spec.
Clarifications needed
None.
Out of scope
- All substrate backup work — handled by `substrate-nightly-backup`.
- Future user-state backup (cloud admin, Stripe billing, agent project history) — that's a separate spec when those features land, not a re-evaluation of this one.
Dependencies
Plan
Not planning to ship. Evaluation summary in the why section.
Reference left here as design-archaeology for any future contributor
asking "did we consider Litestream?" The answer is yes, deliberately,
and the canonical alternative (restic + cold-tier object storage +
whole-/data backup with 30+ day retention) is documented insubstrate-nightly-backup.
Tasks
No tasks defined.
Changelog
-
2026-05-10T20:30:00Z
draft→draftReframed from "deferred, revisit when X" to "indefinitely deferred, not a substrate concern." Decision crystallized: restic + cold-tier DO Spaces + whole-/data backup is the canonical substrate backup approach at any size. Litestream's value (sub-second RPO, PITR) only earns its keep with irreplaceable user state, which the substrate doesn't have and isn't planned to have. If irreplaceable state ever lands (Step 2 agent UI, Cloud admin, Stripe billing), that's a separate "user-state backup" spec — not this one. -
2026-05-09T12:00:00Z
planned→draftSuperseded by `substrate-nightly-backup`. Picked nightly `sqlite3 .backup` snapshot over Litestream WAL streaming for v1 substrate — substrate is mostly regenerable from public APIs, so 24h RPO is acceptable, and the operational simplicity wins. Revisit when irreplaceable user/billing state lands.