launchdraftp0

PR-review skill for Josh contributors

josh-pr-review-skill · updated 2026-05-10T12:00:00Z

Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.

The CMU sugar-rush study (https://civic.io/2026/03/16/ending-the-sugar-rush/)
argues that AI-assisted contributions should be reviewed by AI applying the
*same* project guidance the writer applied — a "closed-loop quality
system." Today, an outside contributor's PR gets eyeballed by a maintainer
who has to remember every Josh convention. A PR-review skill encodes those
conventions into a structured review pass: takes a diff (local branch or
GitHub PR URL), evaluates against AGENTS.md, the four working rules, the
conventions doc, the spec workflow rules, and the critical rules, and
emits a verdict with reasoning. Pairs with the code-quality skill (separate
spec) which evaluates whole files; this skill evaluates whole *changes*.

As a maintainer, I want a structured first-pass review of every PR before I look at it so that my eyeball time goes to judgment calls, not checking that conventions were followed.

As a contributor, I want to run the review skill on my own branch before opening a PR so that I catch issues self-serve rather than ping-pong with a reviewer.

As an AI agent finishing a multi-step change, I want a self-check against the same standards a maintainer would apply so that I don't claim "done" when a reviewer would request changes.

  1. When `/josh-pr-review` is invoked against a diff (local branch vs. main, or a GitHub PR URL), the skill shall produce a structured review covering five dimensions: (a) working-rules adherence, (b) convention compliance, (c) spec-workflow compliance, (d) test coverage for changed code, (e) inventory-page sync (if a data source's status changed).
  2. When the diff introduces a new ingester subpackage, the skill shall verify the source-module layout convention (the canonical 5-or-6-file structure under `josh-ingester/josh_ingester/sources/<name>/`).
  3. When the diff modifies any spec YAML, the skill shall verify the spec-workflow rules: status-flip dry-run, build-spec.py --lint clean, manual changelog entry appended.
  4. When the diff modifies a critical file (`shared/josh_substrate/.../migrations/`, `AGENTS.md`, `conventions.html`, `architecture.html`), the skill shall flag it as `NEEDS_DISCUSSION` for human attention regardless of other findings.
  5. The skill shall produce a single verdict line at the end: `APPROVE`, `REQUEST_CHANGES`, or `NEEDS_DISCUSSION`, with a one-sentence reason and links to the cited convention sections.
  6. The skill shall live at `.claude/skills/josh-pr-review/SKILL.md` and be discoverable via the standard skill-listing mechanism.
kindbash

Command

set -e
# 1. Skill file exists.
test -f .claude/skills/josh-pr-review/SKILL.md
# 2. References the canonical agent guidance + conventions + spec workflow.
grep -q 'AGENTS.md' .claude/skills/josh-pr-review/SKILL.md
grep -q 'https://docs.usejosh.com/operations/conventions/' .claude/skills/josh-pr-review/SKILL.md
grep -q 'https://docs.usejosh.com/operations/spec-workflow/' .claude/skills/josh-pr-review/SKILL.md
# 3. Five review dimensions are documented.
for dim in working-rules conventions spec-workflow test-coverage inventory; do
  if ! grep -qi "$dim" .claude/skills/josh-pr-review/SKILL.md; then
    echo "ERROR: skill missing review dimension: $dim" >&2
    exit 1
  fi
done
# 4. Verdict vocabulary is documented.
grep -q 'APPROVE' .claude/skills/josh-pr-review/SKILL.md
grep -q 'REQUEST_CHANGES' .claude/skills/josh-pr-review/SKILL.md
grep -q 'NEEDS_DISCUSSION' .claude/skills/josh-pr-review/SKILL.md
# 5. Eval fixtures exist for the manual eval.
test -d tests/fixtures/pr_review
test -f tests/fixtures/pr_review/good_pr.patch
ls tests/fixtures/pr_review/bad_*.patch >/dev/null

Expect

exit 0 — skill in place, references all three doc anchors, covers six dimensions, defines three verdicts, fixtures committed

The bash determiner verifies wiring. The semantic test ("does the skill actually catch real PR issues?") requires manual evaluation against the committed `.patch` fixtures. Run those evals before flipping `verified`.

  • Does the skill operate on `git diff` against main, or fetch a GitHub PR via `gh pr diff <num>`? Recommendation: support both, default to current-branch-vs-main.
  • Does the skill post review comments back to GitHub, or just emit to stdout? Recommendation: stdout-only at v1; GitHub-posting is a separate launch-readiness spec.
  • Does this depend on the `josh-code-quality-skill` spec, or duplicate its checks? Recommendation: this skill *invokes* the code-quality skill on changed files, then layers PR-level review on top. No duplication.
  • Posting review comments to GitHub directly (stdout-only at v1).
  • Auto-merging PRs that pass (manual approval is part of the value).
  • Running CI checks (CI runs separately; this skill reads CI status if the diff is from a GitHub PR).
  • File-level convention checking (delegated to the code-quality skill).

Skill scope. Five review dimensions, in order of severity:

1. Critical-file modifications — migrations, AGENTS.md,
conventions.html, architecture.html, repo-structure.html. Always
NEEDS_DISCUSSION for human attention.
2. Spec-workflow compliance — if any spec YAML changed, verify the
status-flip rules from spec-workflow.html#workflow-rules.
3. Convention compliance — invoke the code-quality skill on changed
Python files; aggregate FAILs into the PR-level verdict.
4. Working-rules adherence — heuristic check for the four rules.
E.g., did the diff add abstractions for single-use code (Rule 2)?
Did it touch unrelated code (Rule 3)? Did it add success criteria
for new features (Rule 4)?
5. Test coverage — for any new josh-ingester/josh_ingester/sources/<src>/
directory or new josh-core/josh_core/routers/ file, verify the
three-category test contract is met.
6. Inventory sync — if the diff changes a data source's status
(e.g., flips a spec from in_progress to verified), verify
https://docs.usejosh.com/josh-data-sources/ and https://docs.usejosh.com/data-status/ were updated in
the same commit.

What "good" looks like for a PR.

- Diff is surgical (Rule 3): touches only what the PR description claims to.
- New code follows conventions (the code-quality skill returns no FAILs).
- If a spec status flipped, the workflow rules were followed.
- If a data source moved status, both inventory pages updated.
- Tests added for new behaviour, matching the per-source test contract.
- Commit messages are imperative and scoped.

What "bad" looks like for a PR.

- Mixes a bug fix with unrelated cleanup (Rule 3 violation).
- Adds a Pydantic class mirroring a DB table (schema rule violation).
- New ingester subpackage missing one of the canonical files.
- Spec moved to verified without a determiner run, or shipped without
inventory-page sync.
- Commit message references "for $customer" on a public-bound file.
- Commented-out code or TODO markers added (CMU paper's quality
fingerprints).

Verdict policy.

- APPROVE — all five dimensions clean.
- REQUEST_CHANGES — at least one FAIL on (2)-(5). Reasoning lists
specifics with conventions.html / spec-workflow.html anchors.
- NEEDS_DISCUSSION — critical file touched, OR working-rules
adherence is borderline (e.g., scope creep that's defensible but worth
surfacing). The skill explicitly does not auto-approve changes to
load-bearing infrastructure.

Eval fixtures. Under tests/fixtures/pr_review/ as .patch files
(output of git format-patch):
- good_pr.patch — clean addition of a new test fixture. Should APPROVE.
- bad_scope_creep.patch — bug fix bundled with unrelated refactor.
Should REQUEST_CHANGES citing Rule 3.
- bad_inventory_drift.patch — flips a source spec to shipped without
touching inventory pages. Should REQUEST_CHANGES citing critical rules.
- discussion_migration.patch — touches migrations/versions/. Should
NEEDS_DISCUSSION regardless of correctness.

Eval procedure: invoke skill against each, confirm verdict matches.
Document transcript at tests/fixtures/pr_review/eval-transcript.md.

Maintenance. When AGENTS.md, conventions.html, or spec-workflow.html
change, update SKILL.md in the same commit. The PR-review skill's
authority comes from those three docs — drift between them and the
skill produces silent false-negatives.

0 of 7 done.

  • t1 Draft `.claude/skills/josh-pr-review/SKILL.md` with the six review dimensions, three verdicts, and the verdict policy.
  • t2 Define how the skill consumes a diff (current-branch-vs-main + `gh pr diff <num>`).
  • t3 Author the four `.patch` fixtures under `tests/fixtures/pr_review/` (one APPROVE, two REQUEST_CHANGES, one NEEDS_DISCUSSION).
  • t4 Manually invoke the skill against each fixture; tune SKILL.md until verdicts match seeded outcomes.
  • t5 Document the eval transcript at `tests/fixtures/pr_review/eval-transcript.md`.
  • t6 Update `https://docs.usejosh.com/operations/conventions/` §12 (Pre-PR checklist) to reference the review skill.
  • t7 Update `AGENTS.md` to reference both skills as the symmetric quality loop.

No history yet.

docs/spec/josh-pr-review-skill.html · generated by bin/build-spec.py