PR-review skill for Josh contributors
Header
Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.
Why
The CMU sugar-rush study (https://civic.io/2026/03/16/ending-the-sugar-rush/)
argues that AI-assisted contributions should be reviewed by AI applying the
*same* project guidance the writer applied — a "closed-loop quality
system." Today, an outside contributor's PR gets eyeballed by a maintainer
who has to remember every Josh convention. A PR-review skill encodes those
conventions into a structured review pass: takes a diff (local branch or
GitHub PR URL), evaluates against AGENTS.md, the four working rules, the
conventions doc, the spec workflow rules, and the critical rules, and
emits a verdict with reasoning. Pairs with the code-quality skill (separate
spec) which evaluates whole files; this skill evaluates whole *changes*.
User stories
As a maintainer, I want a structured first-pass review of every PR before I look at it so that my eyeball time goes to judgment calls, not checking that conventions were followed.
As a contributor, I want to run the review skill on my own branch before opening a PR so that I catch issues self-serve rather than ping-pong with a reviewer.
As an AI agent finishing a multi-step change, I want a self-check against the same standards a maintainer would apply so that I don't claim "done" when a reviewer would request changes.
Acceptance criteria (EARS)
- When `/josh-pr-review` is invoked against a diff (local branch vs. main, or a GitHub PR URL), the skill shall produce a structured review covering five dimensions: (a) working-rules adherence, (b) convention compliance, (c) spec-workflow compliance, (d) test coverage for changed code, (e) inventory-page sync (if a data source's status changed).
- When the diff introduces a new ingester subpackage, the skill shall verify the source-module layout convention (the canonical 5-or-6-file structure under `josh-ingester/josh_ingester/sources/<name>/`).
- When the diff modifies any spec YAML, the skill shall verify the spec-workflow rules: status-flip dry-run, build-spec.py --lint clean, manual changelog entry appended.
- When the diff modifies a critical file (`shared/josh_substrate/.../migrations/`, `AGENTS.md`, `conventions.html`, `architecture.html`), the skill shall flag it as `NEEDS_DISCUSSION` for human attention regardless of other findings.
- The skill shall produce a single verdict line at the end: `APPROVE`, `REQUEST_CHANGES`, or `NEEDS_DISCUSSION`, with a one-sentence reason and links to the cited convention sections.
- The skill shall live at `.claude/skills/josh-pr-review/SKILL.md` and be discoverable via the standard skill-listing mechanism.
Success determiner
Command
set -e
# 1. Skill file exists.
test -f .claude/skills/josh-pr-review/SKILL.md
# 2. References the canonical agent guidance + conventions + spec workflow.
grep -q 'AGENTS.md' .claude/skills/josh-pr-review/SKILL.md
grep -q 'https://docs.usejosh.com/operations/conventions/' .claude/skills/josh-pr-review/SKILL.md
grep -q 'https://docs.usejosh.com/operations/spec-workflow/' .claude/skills/josh-pr-review/SKILL.md
# 3. Five review dimensions are documented.
for dim in working-rules conventions spec-workflow test-coverage inventory; do
if ! grep -qi "$dim" .claude/skills/josh-pr-review/SKILL.md; then
echo "ERROR: skill missing review dimension: $dim" >&2
exit 1
fi
done
# 4. Verdict vocabulary is documented.
grep -q 'APPROVE' .claude/skills/josh-pr-review/SKILL.md
grep -q 'REQUEST_CHANGES' .claude/skills/josh-pr-review/SKILL.md
grep -q 'NEEDS_DISCUSSION' .claude/skills/josh-pr-review/SKILL.md
# 5. Eval fixtures exist for the manual eval.
test -d tests/fixtures/pr_review
test -f tests/fixtures/pr_review/good_pr.patch
ls tests/fixtures/pr_review/bad_*.patch >/dev/null
Expect
The bash determiner verifies wiring. The semantic test ("does the skill actually catch real PR issues?") requires manual evaluation against the committed `.patch` fixtures. Run those evals before flipping `verified`.
Clarifications needed
- Does the skill operate on `git diff` against main, or fetch a GitHub PR via `gh pr diff <num>`? Recommendation: support both, default to current-branch-vs-main.
- Does the skill post review comments back to GitHub, or just emit to stdout? Recommendation: stdout-only at v1; GitHub-posting is a separate launch-readiness spec.
- Does this depend on the `josh-code-quality-skill` spec, or duplicate its checks? Recommendation: this skill *invokes* the code-quality skill on changed files, then layers PR-level review on top. No duplication.
Out of scope
- Posting review comments to GitHub directly (stdout-only at v1).
- Auto-merging PRs that pass (manual approval is part of the value).
- Running CI checks (CI runs separately; this skill reads CI status if the diff is from a GitHub PR).
- File-level convention checking (delegated to the code-quality skill).
Dependencies
Plan
Skill scope. Five review dimensions, in order of severity:
1. Critical-file modifications — migrations, AGENTS.md,
conventions.html, architecture.html, repo-structure.html. Always
NEEDS_DISCUSSION for human attention.
2. Spec-workflow compliance — if any spec YAML changed, verify the
status-flip rules from spec-workflow.html#workflow-rules.
3. Convention compliance — invoke the code-quality skill on changed
Python files; aggregate FAILs into the PR-level verdict.
4. Working-rules adherence — heuristic check for the four rules.
E.g., did the diff add abstractions for single-use code (Rule 2)?
Did it touch unrelated code (Rule 3)? Did it add success criteria
for new features (Rule 4)?
5. Test coverage — for any new josh-ingester/josh_ingester/sources/<src>/
directory or new josh-core/josh_core/routers/ file, verify the
three-category test contract is met.
6. Inventory sync — if the diff changes a data source's status
(e.g., flips a spec from in_progress to verified), verify
https://docs.usejosh.com/josh-data-sources/ and https://docs.usejosh.com/data-status/ were updated in
the same commit.
What "good" looks like for a PR.
- Diff is surgical (Rule 3): touches only what the PR description claims to.
- New code follows conventions (the code-quality skill returns no FAILs).
- If a spec status flipped, the workflow rules were followed.
- If a data source moved status, both inventory pages updated.
- Tests added for new behaviour, matching the per-source test contract.
- Commit messages are imperative and scoped.
What "bad" looks like for a PR.
- Mixes a bug fix with unrelated cleanup (Rule 3 violation).
- Adds a Pydantic class mirroring a DB table (schema rule violation).
- New ingester subpackage missing one of the canonical files.
- Spec moved to verified without a determiner run, or shipped without
inventory-page sync.
- Commit message references "for $customer" on a public-bound file.
- Commented-out code or TODO markers added (CMU paper's quality
fingerprints).
Verdict policy.
- APPROVE — all five dimensions clean.
- REQUEST_CHANGES — at least one FAIL on (2)-(5). Reasoning lists
specifics with conventions.html / spec-workflow.html anchors.
- NEEDS_DISCUSSION — critical file touched, OR working-rules
adherence is borderline (e.g., scope creep that's defensible but worth
surfacing). The skill explicitly does not auto-approve changes to
load-bearing infrastructure.
Eval fixtures. Under tests/fixtures/pr_review/ as .patch files
(output of git format-patch):
- good_pr.patch — clean addition of a new test fixture. Should APPROVE.
- bad_scope_creep.patch — bug fix bundled with unrelated refactor.
Should REQUEST_CHANGES citing Rule 3.
- bad_inventory_drift.patch — flips a source spec to shipped without
touching inventory pages. Should REQUEST_CHANGES citing critical rules.
- discussion_migration.patch — touches migrations/versions/. Should
NEEDS_DISCUSSION regardless of correctness.
Eval procedure: invoke skill against each, confirm verdict matches.
Document transcript at tests/fixtures/pr_review/eval-transcript.md.
Maintenance. When AGENTS.md, conventions.html, or spec-workflow.html
change, update SKILL.md in the same commit. The PR-review skill's
authority comes from those three docs — drift between them and the
skill produces silent false-negatives.
Tasks
0 of 7 done.
- t1 Draft `.claude/skills/josh-pr-review/SKILL.md` with the six review dimensions, three verdicts, and the verdict policy.
- t2 Define how the skill consumes a diff (current-branch-vs-main + `gh pr diff <num>`).
- t3 Author the four `.patch` fixtures under `tests/fixtures/pr_review/` (one APPROVE, two REQUEST_CHANGES, one NEEDS_DISCUSSION).
- t4 Manually invoke the skill against each fixture; tune SKILL.md until verdicts match seeded outcomes.
- t5 Document the eval transcript at `tests/fixtures/pr_review/eval-transcript.md`.
- t6 Update `https://docs.usejosh.com/operations/conventions/` §12 (Pre-PR checklist) to reference the review skill.
- t7 Update `AGENTS.md` to reference both skills as the symmetric quality loop.
Changelog
No history yet.