Substrate nightly backup (restic to DO Spaces)
Header
Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.
Why
A nightly backup of the entire /data directory shipped to S3-compatible
object storage via restic (block-level content-addressed dedupe),
with retention managed by restic forget + restic prune. The backup
covers everything durable: the SQLite substrate (/data/josh.db, taken
via sqlite3 .backup into a staging file for consistency), the corpus
directory (/data/corpus/ — raw payloads + normalized Markdown), and
any other state that lands under /data over time. Excludes are
limited to active-write transient files (-wal, -shm, lock files).
Built on SQLite's online-backup API for the database half (so the
snapshot source is internally consistent without quiescing the writer)
and restic's content-addressed dedupe for the upload half. After the
first nightly run, subsequent backups transfer only the changed blocks
— for our pattern (mostly-append corpus + small SQLite block deltas),
daily delta is ~100 MB to ~2 GB even at TB-scale substrate.
This is THE canonical substrate backup approach. No planned
re-evaluation, no "switch to Litestream when X" trigger. The Litestream
spec exists as documentation of the path-not-taken — see its why
for the full reasoning. Short version: Litestream's continuous-WAL
approach only earns its complexity when you have irreplaceable rows
in the substrate (user accounts, billing state, hand-curated content).
Josh's substrate is a cache of public federal data + AI-derived
artifacts; nothing here is irreplaceable, and a 24-hour RPO is fine
because recovery is "restore last night's snapshot + re-run the
ingester for the day's delta."
Storage destination is the DigitalOcean Spaces bucket josh-bucket
(NYC3) on standard-tier storage ($0.02/GiB/mo, no minimum object
retention, fast retrieval, no per-GB retrieval fees). Standard tier
was chosen over cold tier deliberately: at the substrate's projected
size, the ~$10-15/mo cost difference is dwarfed by the operational
simplicity of "just run integrity checks + restores whenever you want
without pricing math." No retrieval-cost caveats threaded through the
runbook, no 30-day-minimum interaction with restic prune, no "do
this rarely because it's expensive" footnotes.
This pipeline previously ran on the DigitalOcean droplet (destroyed
2026-05-10) shipping plain sqlite3 .backup → gzip → aws s3 cp
full-snapshot uploads to a now-deleted bucket (usejosh). The new
bucket josh-bucket is a fresh standard-tier bucket; restic
initializes against it as a clean repo.
User stories
As an OSS self-hoster, I want a sane default backup of my Josh substrate without operating a streaming-replication daemon so that I can recover from disaster without engineering it myself.
As the operator paying the storage + bandwidth bill, I want nightly bandwidth to drop ~99% after the first backup via block-level dedupe, AND the corpus to be backed up alongside the DB so that re-ingestion isn't the only recovery path (which at TB scale is multi-day work + API-quota risk).
As the operator of josh.dev, I want a known-good copy of the entire substrate from at most 24 hours ago, sitting in object storage so that when I (or an ingester bug) corrupts the substrate, I can restore in minutes, including raw bodies and markdown.
As an agent debugging a regression, I want to restore last night's substrate to a local box and reproduce so that I can investigate without disturbing production state.
Acceptance criteria (EARS)
- When the host's nightly systemd timer fires, the system shall run `sqlite3 /data/josh.db ".backup /data/backups/josh-snap.db"` to produce an internally consistent snapshot file without stopping the running containers.
- When the snapshot is staged, the system shall run `restic backup /data` against the configured S3-compatible repository (DO Spaces standard tier `josh-bucket`, prefix `josh-db-restic/`), excluding the live SQLite files (`/data/josh.db`, `-wal`, `-shm`) and the locks directory (`/data/locks/`) — but INCLUDING the consistent staged snapshot at `/data/backups/josh-snap.db` and the entire corpus tree under `/data/corpus/`.
- After every successful backup, the system shall run `restic forget --keep-daily 30 --keep-weekly 8 --keep-monthly 12 --prune` to manage retention internally — no bucket lifecycle rule shall be applied to the repository prefix (lifecycle deletion would orphan pack files referenced by recent snapshots).
- When the upload fails, the system shall exit non-zero so systemd records the failure in the journal and the next run isn't fooled into thinking it succeeded.
- When the runbook restore procedure is run against the latest restic snapshot, the resulting restored SQLite file shall pass `PRAGMA integrity_check;` returning `ok` AND the restored corpus directory shall match the file listing from the live host (modulo files written after the snapshot).
Success determiner
Command
set -euo pipefail
# === CNV preflight: substrate host reachable ===
# Three-state exit: 0 PASS, 1 FAIL, 77 CNV. Backup-pipeline failure
# must be distinguishable from "I couldn't even reach the host" so
# a routine doesn't sound a backup-broken alarm on a transient SSH
# outage.
if ! ssh -o ConnectTimeout=5 -o BatchMode=yes josh 'true' 2>/dev/null; then
echo "CNV: ssh josh unreachable; cannot verify backup" >&2
exit 77
fi
# === Pipeline-health check (all 5 sub-checks asserted on host) ===
# Proves: fresh restic snapshot exists, repo integrity passes a
# structural check, the most-recent snapshot restores cleanly, the
# restored SQLite has non-trivial row counts for shipped sources,
# and the restored corpus tree contains source data.
ssh josh '
set -euo pipefail
source /etc/josh-backup.env
# 1. Snapshot listing succeeds + most-recent snapshot is < 25h old
latest_ts=$(restic snapshots --json | jq -r ".[-1].time")
[ -n "$latest_ts" ]
age_hours=$(( ( $(date -u +%s) - $(date -u -d "$latest_ts" +%s) ) / 3600 ))
test "$age_hours" -lt 25
# 2. Structural repo check (fast — does NOT re-download data)
restic check --no-cache
# 3. Restore-to-tempdir round trip on the most-recent snapshot
tmp=$(mktemp -d) && trap "rm -rf $tmp" EXIT
restic restore latest --target "$tmp"
# 4. Restored SQLite has non-trivial row counts for shipped sources.
# Was: SELECT COUNT(*) without `test` (gameable — empty DB still
# exits 0). Now: floor-asserted against the shipped-source floors
# used in crs-reports-ingester and legislators-and-committees-ingester
# determiners. If those floors change, this assertion follows.
snap=$(find "$tmp" -name josh-snap.db | head -1)
[ -n "$snap" ]
crs_count=$(sqlite3 "$snap" "SELECT COUNT(*) FROM crs_reports;")
legis_count=$(sqlite3 "$snap" "SELECT COUNT(*) FROM legislators;")
test "$crs_count" -ge 22000
test "$legis_count" -ge 12000
# 5. Restored corpus contains at least one source directory
corpus_root=$(find "$tmp" -type d -name corpus | head -1)
[ -n "$corpus_root" ]
test "$(ls -1 "$corpus_root" | wc -l)" -ge 1
echo "PASS: age=${age_hours}h crs_count=${crs_count} legis_count=${legis_count}"
'
Expect
Runs in ~5-30 minutes depending on substrate size (mostly the restore — restic's metadata-only check is fast). Re-runnable as often as wanted to confirm pipeline health. Adversarial mutation suite lives at `docs/spec/mutations/substrate-nightly-backup.yaml`.
Clarifications needed
None.
Out of scope
- Point-in-time recovery (Litestream / WAL streaming). Evaluated and indefinitely deferred — see `substrate-litestream-backup` for the full reasoning. Short version: PITR's complexity only earns its keep with irreplaceable user/billing state, which Josh's substrate doesn't have. 24h RPO with re-ingest recovery is the right shape.
- Multi-region backup. Single-region NYC3 is fine for v1.
- Read replicas / live replication. Not needed — separate concern from backup.
- Bucket-level lifecycle rules on the restic repo prefix — these would orphan pack files referenced by recent snapshots and corrupt the repository. Retention is restic's job.
- Cold-tier storage. Considered and rejected: ~$10-15/mo savings at TB scale wasn't worth the operational caveats (retrieval-cost math, 30-day-minimum interaction with prune, 'run check rarely because it's expensive' caveats throughout the runbook).
- Backing up host-level state outside `/data` (system config, secrets, container state). Config + secrets live in the repo (age-encrypted); everything stateful lives under `/data` per the volume-as-data-host contract. The whole-/data backup is sufficient.
Dependencies
Plan
## Architecture (host-side restic against DO Spaces standard tier)
- Where it runs: on the OVHcloud bare-metal host, not inside any
container. The host already has /data available, has cron/systemd,
and is the simplest place to run a "snapshot files and ship them"
job. No image bloat, no kamal app exec gymnastics.
- Trigger: systemd timer (josh-backup.timer) firing daily at
03:30 UTC (off-peak). systemd over crond because it gives us
journal logs, OnFailure= handling, and missed-trigger semantics
for free.
- Script: /usr/local/bin/josh-backup.sh. Outline:
```bash
set -euo pipefail
source /etc/josh-backup.env
# 1. Stage a consistent SQLite snapshot under /data/backups/
mkdir -p /data/backups
sqlite3 /data/josh.db ".backup /data/backups/josh-snap.db"
# 2. Single restic backup of /data; first run also runs restic init
if ! restic snapshots >/dev/null 2>&1; then
restic init
fi
restic backup /data \
--tag nightly \
--exclude /data/josh.db \
--exclude /data/josh.db-wal \
--exclude /data/josh.db-shm \
--exclude /data/locks
# 3. Internal retention management (no bucket lifecycle rule)
restic forget \
--keep-daily 30 \
--keep-weekly 8 \
--keep-monthly 12 \
--prune
# 4. Cleanup staging
rm -f /data/backups/josh-snap.db
```
- What gets backed up:
- /data/backups/josh-snap.db — the consistent SQLite snapshot
(the live /data/josh.db is excluded because its bytes are
mid-WAL-write; the staged snap is the canonical version).
- /data/corpus/<source>/bodies/{raw,markdown}/... — every raw
payload and normalized Markdown body across all sources.
Mostly-append data; restic dedupes near-perfectly across nights.
- Anything else under /data that future code creates — the
include-by-default + targeted-excludes pattern means new state
lands in the backup automatically.
- What's excluded:
- /data/josh.db, -wal, -shm — the live SQLite files. Mid-write
bytes are not safe to back up directly; we get them via the
staging snapshot.
- /data/locks/ — flock files used by the ingester for advisory
locks. Transient state; restoring stale locks would block the
next ingest run.
- Why restic over the prior sqlite3 .backup → gzip → aws s3 cp
pattern: at v1 size (~14 GB), bandwidth difference is modest. At
TB-scale projections (substrate file ~500 GB + corpus ~500 GB by
mid-2026), the difference is decisive: nightly delta uploads of
~100 MB to ~2 GB vs ~1 TB. Restic also gives us per-snapshot
metadata (timestamps, tags, host), repository-level integrity
checking (restic check), and built-in encryption.
- Why no bucket lifecycle rule: restic stores data as content-
addressed pack files. Recent snapshots reference pack files
uploaded weeks-to-months ago for unchanged portions. A bucket-level
"delete objects older than N days" rule would silently delete those
still-referenced pack files, corrupting the entire repository.
Retention MUST be inside restic via restic forget --prune, which
only deletes pack files no longer referenced by ANY snapshot.
- Retention policy: --keep-daily 30 --keep-weekly 8 --keep-monthly 12.
That's 30 daily + 8 weekly + 12 monthly = ~14 months of rolling
restoration coverage. Restic's dedupe means the long-tail of
weekly/monthly snapshots adds only the unique-blob bytes, not full
snapshot copies. At TB scale this costs maybe a few extra GB of
storage on top of the steady-state base data.
- Why standard tier: ~$0.02/GiB/mo. At the projected ~1 TB total
repo size, that's ~$20/mo of storage — small in absolute terms.
Cold tier would save ~$13/mo but adds operational complexity (per-GB
retrieval pricing, 30-day-minimum object retention interacting with
restic prune, "run integrity checks rarely because they're
expensive" caveats). Standard tier means we can run restic check whenever we want, restore freely for testing, and
--read-data
treat the bucket as a normal dependency rather than a cost center.
- Tooling on host: sqlite3 (apt), restic (official binary
release — single static Go binary, no apt package needed), jq
(apt, for the success_determiner snapshot listing).
- Credentials: loaded by the systemd unit from
/etc/josh-backup.env, a 0600 root-owned file:
```
AWS_ACCESS_KEY_ID=... # the existing DO Spaces key (carries over from usejosh)
AWS_SECRET_ACCESS_KEY=...
RESTIC_REPOSITORY=s3:https://nyc3.digitaloceanspaces.com/josh-bucket/josh-db-restic
RESTIC_PASSWORD=... # repo encryption password
.kamal/secrets
Repo encryption password is a new secret added to .kamal/secrets.age
(encrypted via age into ). Lose this passwordSSH Keys and Tokens.tar.age` alongside SSH keys.
and the repo is unrecoverable — back it up to iCloud
The DO Spaces virtual-hosted URL https://josh-bucket.nyc3. (the form shown in the DO console) and the
digitaloceanspaces.com
path-style URL above are equivalent; restic prefers path-style
(<endpoint>/<bucket>) for reliability across S3-compatible providers.
- Restic check cadence: standard tier means no per-GB retrieval
cost concern. Recommended cadence:
- restic check (metadata-only, fast, cheap): nightly via the
success_determiner.
- restic check --read-data (re-reads every pack file, verifies
bit-rot): monthly. Or weekly if we want to be paranoid — the cost
is just bucket egress, which DO doesn't meter aggressively. We
can tune this empirically once the pipeline is live.
- Restore path (full runbook in
https://docs.usejosh.com/operations/restore-from-backup/):
```bash
source /etc/josh-backup.env
restic snapshots # pick the snapshot ID
restic restore <id> --target /tmp/restore
# Restore SQLite (kamal stop + cp + delete WAL/SHM)
kamal app stop
cp /tmp/restore/data/backups/josh-snap.db /data/josh.db
rm -f /data/josh.db-wal /data/josh.db-shm
# Restore corpus (rsync from the snapshot tree)
rsync -aHv --delete /tmp/restore/data/corpus/ /data/corpus/
sqlite3 /data/josh.db "PRAGMA integrity_check;" # must print ok
kamal app boot
```
The WAL-deletion step matters: a stale WAL from before the restore
would corrupt the freshly-restored DB on first open.
## When the OVHcloud host arrives — activation runbook
Pre-server prep is complete (see "Status snapshot" below). Activation
is two manual steps + one watch-and-wait:
1. Run the install script from the local Mac:
``bash`
JOSH_HOST=josh bin/josh-backup/install.sh
/var/cache/restic
The script copies the systemd unit + timer + env template to the
host, installs restic + jq + sqlite3, creates /data/backups`, and enables the daily 03:30 UTC timer.
and
2. Populate /etc/josh-backup.env on the host. The install script
copies the template; you fill in the placeholders from the
decrypted .kamal/secrets:
```
RESTIC_PASSWORD ← .kamal/secrets RESTIC_PASSWORD
AWS_ACCESS_KEY_ID ← .kamal/secrets BACKUP_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY ← .kamal/secrets BACKUP_SECRET_ACCESS_KEY
RESTIC_REPOSITORY
, RESTIC_CACHE_DIR, BACKUP_STAGE_DIR,BACKUP_TAG_HOST` are pre-filled correctly.
3. Smoke-test the backup:
``bash`
ssh josh 'systemctl start josh-backup.service \
&& journalctl -u josh-backup.service -n 50 --no-pager'
restic init
The first run connects to the already-initialized restic repo
(init was done from local Mac during prep — no steps3://josh-bucket/josh-db-restic`.
needed). Snapshot 1 lands in
4. Verify and round-trip restore. Per the runbook in
https://docs.usejosh.com/operations/restore-from-backup/. Restic restore latest
to /tmp/restore, confirm SQLite snap opens + corpus tree
restored. Lifecycle flips planned → in_progress on first
successful smoke; → verified on round-trip restore success;
→ shipped after 7 days of clean nightly runs.
## Status snapshot (pre-server prep, 2026-05-10)
- Repo initialized: s3://josh-bucket/josh-db-restic (NYC3,
standard tier), repo ID 1769e39995, 0 snapshots, integrity
check clean.
- Password stored in 3 places: .kamal/secrets.age (committed
encrypted in repo), .kamal/secrets (gitignored plaintext on
local Mac), iCloud SSH Keys and Tokens.tar.age (offsite backup
via the apple-bridges/age-encryption skill — under
josh-restic-repo-password.txt).
- Install scripts: complete in bin/josh-backup/ (script,
install.sh, systemd unit, timer, env template).
- Restore runbook: complete at https://docs.usejosh.com/operations/restore-from-backup/
(full restic-based rewrite).
## On Litestream (the path not taken)
Considered and rejected as a future migration path. Seesubstrate-litestream-backup for the spec-level documentation.
Short version: Litestream's value (sub-second RPO + PITR) only earns
its complexity with irreplaceable user state, which the substrate
doesn't have at any size.
Tasks
8 of 11 done.
- t1 Spec fleshed out + restic + standard-tier + whole-/data + 30-day-retention decisions made
- t2 DO Space `josh-bucket` (NYC3, standard tier) created; existing access key carries over from the deleted `usejosh` bucket
- t3 Add `RESTIC_PASSWORD` to `.kamal/secrets`, re-encrypt to `.kamal/secrets.age` (committed `f511b3e`), back up to iCloud `SSH Keys and Tokens.tar.age` under `josh-restic-repo-password.txt` (round-trip + sha256 verified)
- t4 Host script bin/josh-backup/josh-backup.sh written (sqlite3 .backup → restic backup /data --exclude ... → restic forget --keep-daily 30 --keep-weekly 8 --keep-monthly 12 --prune)
- t5 systemd unit + timer (josh-backup.service / josh-backup.timer) authored locally with ReadWritePaths=/data /var/cache/restic; install.sh + env template updated to the restic shape — INSTALL on the OVHcloud host pending provision (script ready to run)
- t6 /etc/josh-backup.env populated on the OVHcloud host (root-owned, 0600), credentials + RESTIC_REPOSITORY + RESTIC_PASSWORD loaded by the unit (template at bin/josh-backup/josh-backup.env.template ready)
- t7 `restic init` against `s3://josh-bucket/josh-db-restic/` from local Mac — repo ID `1769e39995`, structural integrity check passes (no errors), 0 snapshots ready to receive the first nightly backup from the OVHcloud host. Smoke backup of real data lands as part of t6 (host install).
- t8 Round-trip restore verified on the new host: `restic restore latest --target /tmp/restore`, opens as valid SQLite, row counts match live DB, corpus tree restored intact
- t9 Runbook https://docs.usejosh.com/operations/restore-from-backup/ rewritten for restic + whole-/data restore commands (replaces the prior `aws s3 cp` + `gunzip` flow); bucket name updated `usejosh` → `josh-bucket`
- t10 CLAUDE.md narrative mentions of Litestream cleaned up — confirmed via grep, no Litestream references in CLAUDE.md to remove (defensive task; closing as N/A)
- t11 Spec lifecycle: planned → in_progress → verified → shipped
Changelog
-
2026-05-13T00:30:00Z
planned→blockedPausing this spec explicitly. The original implementation shipped + ran nightly against the old DigitalOcean host (157.245.246.232); since the move to OVHcloud the spec regressed to `planned` because the runbook hasn't been re-applied on the new host. We're not re-applying it yet. Rebuilding backup infra on top of a still-moving deploy shape invites rework — we just landed the single-image consolidation (2026-05-12), renamed two Python packages (2026-05-12), and OSS-launch hygiene is in flight. Repo structure and deploy shape need to settle before we re-cut the backup pipeline. When unblocked, the spec is recoverable as-is — the runbook and acceptance criteria below are still accurate for the restic-to-DO-Spaces approach; only the host details (IP, systemd paths) need refreshing for the new substrate host. Specific gating items before this can move back to `in_progress`: 1. Final repo structure decision locked (no more package renames, no `josh-foundation/` namespace pivot). 2. Deploy shape stable (no more Kamal config reshuffles). 3. The follow-up `/data/cache/huggingface` restic-exclude can be folded in at the same time (HF model weights are regenerable from upstream and would bloat backups). -
2026-05-10T23:00:00Z
planned→planned**Pre-server prep wrapped: secrets re-encrypted, password backed up offsite, activation runbook added to the spec.** - `.kamal/secrets.age` re-encrypted with `RESTIC_PASSWORD` (committed `f511b3e`); round-trip decrypt verified. - iCloud `SSH Keys and Tokens.tar.age` updated via the age- encryption skill — added `josh-restic-repo-password.txt` with full repo metadata (URL, repo ID, restore quickstart, the password value at the bottom). README updated with new "Active Tokens" section. sha256 of the password in iCloud equals sha256 of the password in `.kamal/secrets`. Safety copy of the prior iCloud archive kept as `…tar.age.bak-2026-05-10T07-57-28Z` for a few weeks. - Spec plan extended with a "When the OVHcloud host arrives — activation runbook" section: 4 numbered steps (install.sh → populate env → smoke → round-trip restore) so the activation is mechanical when the host comes up. - Status snapshot section added documenting where the password lives (3 places), what's installed locally, and what's server-blocked. All pre-server work is done. Three remaining tasks (t6 install env file on host, t8 round-trip restore, t11 lifecycle flip) are activated by the install.sh one-liner once the host is reachable. -
2026-05-10T22:30:00Z
planned→planned**Restic repo initialized — pre-server prep effectively complete.** - **t3:** `RESTIC_PASSWORD` added to `.kamal/secrets`. (Re-encrypt to `.kamal/secrets.age` + iCloud backup of the password — one more user step.) - **t7:** `restic init` ran cleanly against `s3:https://nyc3.digitaloceanspaces.com/josh-bucket/josh-db-restic` from local Mac. Repo ID `1769e39995`. `restic check` shows no errors against the empty repo. `restic snapshots` returns empty list (expected). When the OVHcloud host runs its first `josh-backup.sh`, it'll connect to this same repo (same password + creds) and write its first snapshot. Status now: 8 of 11 tasks done. Three remaining are all server-blocked (t6 install env file on host, t8 round-trip restore verification, t11 lifecycle flip). -
2026-05-10T22:00:00Z
planned→planned**Pre-server prep landed: 4 of 11 tasks now done.** Local-side work that can finish before the OVHcloud host arrives: - **t4 (script):** `bin/josh-backup/josh-backup.sh` rewritten for the restic flow. Stages SQLite snap into `/data/backups/`, runs restic backup of /data with excludes for live SQLite + locks, runs forget --keep-daily 30 --keep-weekly 8 --keep-monthly 12 --prune, cleans up staging. - **t5 (units + install):** systemd unit updated (ReadWritePaths now `/data /var/cache/restic`; ProtectHome stays true since RESTIC_CACHE_DIR is explicit). install.sh updated to install restic from official binary release + jq + sqlite3 + pre-create /var/cache/restic. Files ready; install runs once the host exists. - **t6 (env template):** bin/josh-backup/josh-backup.env.template rewritten with RESTIC_REPOSITORY (s3:https://nyc3.digitaloceanspaces.com/josh-bucket/josh-db-restic), RESTIC_PASSWORD, RESTIC_CACHE_DIR, BACKUP_STAGE_DIR, BACKUP_TAG_HOST. - **t9 (runbook):** https://docs.usejosh.com/operations/restore-from-backup/ fully rewritten for restic restore commands. New sections cover whole-/data restore (DB + corpus via rsync), restic check --read-data cadence guidance, and restic-specific troubleshooting. - **t10 (CLAUDE.md):** confirmed no Litestream mentions to remove — closing as N/A. Remaining pre-server: t3 (RESTIC_PASSWORD into .kamal/secrets + iCloud backup) and t7 (restic init from local Mac after t3) — both gated on the user generating + storing the password. Server-blocked: t6 install of env file ON the host, t8 round-trip restore verification, t11 lifecycle flip. -
2026-05-10T21:00:00Z
planned→planned**Tier flip cold → standard, bucket flip `usejosh` → `josh-bucket`.** Operational simplicity won. Cold tier's ~$10-15/mo savings at TB scale wasn't worth the operational caveats threaded throughout the spec: per-GB retrieval pricing math, 30-day minimum object retention interacting with `restic prune`, "run check rarely because it's expensive" footnotes throughout the runbook. User deleted the cold-tier `usejosh` bucket, created a fresh standard-tier `josh-bucket` (NYC3), confirmed the existing access key carries over. Spec changes: - Retention stays at `--keep-daily 30 --keep-weekly 8 --keep-monthly 12` (no longer chosen to align with cold-tier 30-day minimum, but still good for ~14 months coverage). - Cold-tier specific clarifications (retrieval pricing, ghost- storage tax) dropped from `clarifications_needed`. - Cold tier added to `out_of_scope` with the rejection rationale. - Endpoint URL: `s3:https://nyc3.digitaloceanspaces.com/josh-bucket/josh-db-restic`. - Tasks restructured: dropped t4 (no lifecycle rule to delete on a fresh bucket) and t10 (no old `josh-db/` prefix to clean up on a deleted bucket); renumbered. Cost reality at 1 TB: ~$20/mo standard vs ~$7/mo cold = $13/mo delta. Operational simplicity is worth that easily. -
2026-05-10T20:30:00Z
planned→planned**Three follow-up decisions tightening the v1 backup approach.** 1. **Backup scope: `/data/josh.db` only → entire `/data`** (with excludes for the live SQLite files, locks dir, and staging snap). Corpus is now in scope. Restic dedupes the corpus near-perfectly (mostly append-only per-record blobs). 2. **Retention: `--keep-daily 14 --keep-weekly 4 --keep-monthly 12` → `--keep-daily 30 --keep-weekly 8 --keep-monthly 12`.** 3. **Litestream: deferred indefinitely, not re-evaluated.** -
2026-05-10T20:00:00Z
planned→planned**Architectural pivot: restic + cold-tier DO Spaces.** Block-level content-addressed dedupe drops nightly bandwidth ~99% after the first backup. (Tier later flipped back to standard — see next entry.) -
2026-05-10T18:00:00Z
verified→plannedDigitalOcean droplet destroyed 2026-05-10. The host-side install lived on the destroyed droplet and is gone with it. Code, runbook, and destination credentials all survive in the repo / Spaces. Tasks reset; will land via `substrate-bare-metal-host` provision sequence. -
2026-05-09T13:30:00Z
in_progress→verifiedEnd-to-end pipeline live and round-trip verified. **Install on the `josh` host (157.245.246.232).** - `/usr/local/bin/josh-backup.sh`, `/etc/systemd/system/josh-backup.{service,timer}`, `/etc/josh-backup.env` (root:root 0600). - AWS CLI v2.34.45 from the official installer (Ubuntu 24.04 dropped the apt `awscli` package). `sqlite3` from apt. **DigitalOcean Spaces.** Bucket: `usejosh` (NYC3) — since deleted. Lifecycle rule applied to prefix `josh-db/`. **Acceptance criteria check.** All passed (see git history for the full runbook output). **Decision: dropped per-night integrity_check.** First test run with integrity_check took ~57 min wall-clock and 5.4 GB RAM peak. With it removed the run is ~27 min. -
2026-05-09T13:00:00Z
planned→in_progressSpec fleshed out and pivoted from the original Litestream design to a nightly `sqlite3 .backup` → gzip → `aws s3 cp` flow.