Database migrations

How Josh runs schema migrations against the production SQLite file under Kamal.

Storage backend: SQLite + FTS5 + sqlite-vec. Alembic still runs the same way as it would against Postgres; only the driver and target change.

TL;DR

Use a Kamal pre-deploy hook that runs Alembic against the new image before the new container boots. The exact pattern:

#!/bin/bash -e
kamal app exec -p -q -d "$KAMAL_DESTINATION" --version "$KAMAL_VERSION" "alembic upgrade head"

This is the canonical Kamal pattern, what the Rails community canonicalized first (gist by bibstha), and what Alembic + FastAPI + Kamal projects use today. The pattern is identical for SQLite — Alembic doesn't care about the target backend at the orchestration level.

What's different from a Postgres flow

The substrate is a single SQLite file at /data/josh.db (with WAL companions), not a Postgres accessory. So:

No kamal accessory commands. No accessory exists.
The DB target is a path, not a host:port. Alembic env reads SUBSTRATE_DB_PATH=/data/josh.db from env (or a SQLAlchemy URL like sqlite+aiosqlite:////data/josh.db).
josh-core and josh-ingester both bind-mount /data from the host so the same file is visible from each container.
Migrations run from josh-core (the canonical image with Alembic + the shared package), and the file is shared via volume mount with the ingester.

Why pre-deploy, not post-deploy

Pre-deploy (recommended): the hook runs after the new image is built and pulled, but before the new app container replaces the old one. If migration fails, the deploy aborts — the old container keeps serving traffic against the old schema, no inconsistency.

Post-deploy (alternative): runs against the already-swapped new container with --reuse. Risk: brief window where new code runs against old schema. Fine for additive migrations, dangerous for breaking ones.

We standardize on pre-deploy.

Why `--version $KAMAL_VERSION`, not `--reuse`

Flag	What it does	When to use
`--version $KAMAL_VERSION`	Spins up an ephemeral container from the new image (the one Kamal just built and pulled)	Pre-deploy hooks — the new image has the new migrations
`--reuse`	Runs in the currently running app container	Manual ad-hoc commands, post-deploy hooks

In a pre-deploy hook the old container is still running. It has the old migration files. We want the new migrations from the new image. So --version $KAMAL_VERSION.

Flag breakdown

kamal app exec -p -q -d "$KAMAL_DESTINATION" --version "$KAMAL_VERSION" "alembic upgrade head"

Flag	Purpose
`-p` (alias `--primary`)	Run on primary server only. Without this the migration runs on every host, racing for the lock and producing duplicate "already applied" errors.
`-q` (alias `--quiet`)	Suppress verbose Kamal output; still prints command output.
`-d "$KAMAL_DESTINATION"`	Destination env (e.g. `production`). Optional today (single destination) but future-proof.
`--version "$KAMAL_VERSION"`	Pin to the deploy's image. Set automatically by Kamal in the hook env.

What the hook does in deploy order

1. Local: kamal deploy
2. Local: build image on ritz remote builder, push to Docker Hub
3. SSH:   pull image on josh server
4. Local: → run .kamal/hooks/pre-deploy ←
              ↳ kamal app exec --version <new> "alembic upgrade head"
              ↳ The ephemeral migration container mounts /data the same way the running app does
              ↳ Schema in /data/josh.db now matches the version the new image expects
              ↳ Old josh-core container still serving requests against the same file
                (SQLite serializes the writer; the running app's reads are fine in WAL)
              ↳ If migration fails, hook returns nonzero, deploy aborts
5. SSH:   start new josh-core container (will see the new schema)
6. Local: kamal-proxy waits for new container's /health, then swaps traffic
7. SSH:   stop old container

Setup status

Pre-deploy hook is wired and live as of 2026-05-09. Two SQLite migrations are applied in production: 0001_state_tables (ingestion state) and 0002_crs (CRS reports + FTS5 + vec0). Future deploys auto-apply migration #3 onward via the hook.

Migrations live in the shared/josh_substrate/ package — every service installs this package and has access to the same migrations. See repo-structure for the rationale.

Files in place (or to be re-authored):

shared/josh_substrate/pyproject.toml — needs deps swap: aiosqlite (driver) + sqlalchemy[asyncio] + alembic + pydantic + sqlite-vec (PyPI wheel that bundles the extension binary). Drop psycopg[binary].
shared/josh_substrate/alembic.ini — keep script_location = josh_substrate:migrations.
shared/josh_substrate/src/josh_substrate/migrations/env.py — async migration runner reading SUBSTRATE_DB_PATH from env, using sqlite+aiosqlite driver. Sets PRAGMA foreign_keys = ON and PRAGMA journal_mode = WAL on connection.
shared/josh_substrate/src/josh_substrate/migrations/versions/0001_state_tables.py — first migration: state tables in SQLite-flavored DDL.
Project-root Dockerfile — copies shared/josh_substrate and pip-installs it; copies alembic.ini to /app/alembic.ini. Every role-container ships from this image, so any role has access to alembic + migrations.

In the running container, alembic upgrade head (run from /app) reads /app/alembic.ini, follows josh_substrate:migrations, and applies migrations from inside the installed package against the SQLite file at SUBSTRATE_DB_PATH.

Pre-deploy hook lives at .kamal/hooks/pre-deploy (project root). The hook targets --roles web so alembic runs once across the role-containers (not once per role). Web is the conceptual schema owner; ingester (and embedder, when activated) pick up the new schema implicitly via the shared /data/josh.db bind mount.

First deploy on a fresh host

The pre-deploy hook breaks kamal setup on a fresh host. Reason: the hook's kamal app exec --version <new> "alembic upgrade head" internally does docker run --env-file .kamal/apps/josh/env/roles/web.env, but that env file isn't pushed to the host until kamal starts the first container — which only happens after the pre-deploy hook. Chicken and egg.

The deploy fails with:

docker: open .kamal/apps/josh/env/roles/web.env: no such file or directory
ERROR (Kamal::Cli::HookError): Hook `pre-deploy` failed

This is harmless for subsequent deploys (env files persist on the host once written), but it blocks every fresh-host bootstrap. The workaround:

# Run from project root.

# 1. Skip the hook on first setup — env files get pushed during this step,
#    container starts, healthcheck passes, but /data/josh.db is empty.
env LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 kamal setup --skip-hooks

# 2. Apply migrations manually against the now-running web container.
env LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 kamal app exec --reuse --roles web "alembic upgrade head"

# 3. Confirm.
env LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 kamal app exec --reuse --roles web "alembic current"
# Expected: e.g. "0005 (head)"

After this one-time bootstrap, all subsequent kamal deploy runs apply migrations cleanly via the pre-deploy hook. The hook's logic is correct; only the first-deploy ordering is broken.

Why not fix the hook? The hook would need to ssh to each host, check for the env file, and conditionally run the migration — which adds a non-trivial probe step on every deploy to handle a once-per-host edge case. Documenting the bootstrap workaround is the cheaper solution. Revisit if this trips up multiple operators.

Manual / ad-hoc migration runs

For now, applying migrations is a manual step after each deploy that contains schema changes. Run from project root (kamal looks for config/deploy.yml relative to cwd). Pass --roles web so the command runs against the web role's container — the conceptual schema owner.

# Run from project root.
#
# CRITICAL: Always wrap kamal commands with locale env vars on macOS — kamal v2.9.0
# hits a Ruby US-ASCII encoding error without them. Skip and you get:
#   ERROR (ArgumentError): Exception while executing on host: invalid byte sequence in US-ASCII
env LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 kamal app exec --reuse --roles web "alembic upgrade head"

# Inspect current schema version
env LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 kamal app exec --reuse --roles web "alembic current"

# Author a new migration (use script.py.mako template; edit the generated file
# under shared/josh_substrate/src/josh_substrate/migrations/versions/)
env LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 kamal app exec --reuse --roles web "alembic revision -m 'add fr_documents'"

# Roll back one revision (rarely needed; usually fix-forward)
env LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 kamal app exec --reuse --roles web "alembic downgrade -1"

Gotcha 1: chained commands inside the quoted string get parsed by your local shell, not the container's. Don't write "alembic current && alembic upgrade head" — the && runs locally and alembic isn't on your local PATH. Run two separate kamal app exec calls, or wrap in sh -c '...'.

Gotcha 2: alembic revision writes the new migration into the installed package inside the container. That file is inside the Docker image, not in your repo. You'll need to copy it out (docker cp) or — far better — author migrations by hand in shared/josh_substrate/src/josh_substrate/migrations/versions/ and rebuild the image. We may add a small developer script for this when it becomes friction.

The wired pre-deploy hook

The hook script (committed at .kamal/hooks/pre-deploy):

#!/bin/bash
set -e

service_dir=$(basename "$PWD")

if [ "$service_dir" != "josh-core" ]; then
    echo "[pre-deploy] cwd=$service_dir — skipping alembic (only josh-core owns migrations)"
    exit 0
fi

echo "[pre-deploy] [$KAMAL_VERSION] running alembic upgrade head against new image…"
kamal app exec -p -q -d "${KAMAL_DESTINATION:-production}" --version "$KAMAL_VERSION" "alembic upgrade head"
echo "[pre-deploy] [$KAMAL_VERSION] migration complete"

Why the cwd gate: .kamal is shared via symlink. kamal deploy from josh-ingester/ would also find this hook, but josh-ingester's Dockerfile doesn't COPY alembic.ini — running alembic there would fail with "no config file". The check on basename "$PWD" keeps josh-ingester deploys clean.

For local iteration on a migration without a full deploy, run Alembic against a local SQLite file:

cd shared/josh_substrate
SUBSTRATE_DB_PATH=/tmp/josh-dev.db alembic upgrade head
SUBSTRATE_DB_PATH=/tmp/josh-dev.db alembic revision --autogenerate -m "msg"

The SUBSTRATE_DB_PATH env points the Alembic env at a throwaway DB so you can iterate on migration shape before pushing to a real environment.

Failure modes

Failure	Effect	Recovery
Migration fails (SQL error)	Pre-deploy hook exits nonzero, deploy aborts before swap. Old container still serving against old schema.	Fix migration, redeploy.
Hook script not executable	Kamal silently skips it. Migration doesn't run.	`chmod +x .kamal/hooks/pre-deploy`.
Hook runs but no `-p` flag	Migration fires on every host, races. With single-host deploy this is harmless; with multi-host it produces duplicate-application errors.	Add `-p`.
`--reuse` used in pre-deploy	Old container's old code/migrations run. New schema not applied. New container will fail healthcheck or run against wrong schema.	Use `--version $KAMAL_VERSION` in pre-deploy.
Migration runs while ingester is mid-write	SQLite serializes — migration's `BEGIN IMMEDIATE` will wait for the ingester's transaction to commit, then run. With WAL it's a small pause, not a failure.	Schedule migrations during ingester quiet windows if any single migration is slow.
Bad migration leaves SQLite in mixed state	SQLite has DDL transactions for most statements; partial application is rare. If a migration hits a non-transactional statement and fails, schema may be partly applied.	Hand-write a recovery migration to align the schema with what `alembic_version` says.

v2 considerations

Multi-environment support. When we add staging, the -d "$KAMAL_DESTINATION" already targets the right env. Each destination would have its own deploy.yml.
Backups before risky migrations. Take a Litestream snapshot before any destructive migration. With Litestream, this is automatic — every migration is implicitly captured by the streaming WAL.
Long migrations. SQLite doesn't have CREATE INDEX CONCURRENTLY. Build large indexes during low-traffic windows; for very large tables, build the index on a side copy and swap the file (rare; not a v1 concern).