Code-quality skill for Josh contributors
Header
Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.
Why
AI agents (Claude Code, Cursor, Codex) writing or reading Josh code need a
consistent way to check files against https://docs.usejosh.com/operations/conventions/.
Without a skill, every agent re-derives "what good Josh code looks like"
from prose each session — which the CMU sugar-rush research
(https://civic.io/2026/03/16/ending-the-sugar-rush/) shows is unreliable
and produces drift over time. A skill compresses the conventions doc into
a checkable contract: invoke /josh-code-quality <path>, get a structured
report grouped by convention category. Same input → same output. Pairs with
the PR review skill (separate spec) which evaluates whole changes.
User stories
As a contributor finishing a code change, I want a one-command check that my files match the Josh conventions so that I catch drift before opening a PR rather than during code review.
As an AI agent, I want a skill that returns a structured findings report so that I can self-correct against codified standards rather than guessing what "good" looks like.
As a maintainer reviewing an unfamiliar source, I want a quick scan against the conventions so that I know where to focus my review.
Acceptance criteria (EARS)
- When `/josh-code-quality <path>` is invoked on a Python file or directory, the skill shall produce findings grouped by convention category (style, async, schema, db, source-layout, errors, logging, tests, raw_*) and reference the specific section of `conventions.html` for each finding.
- When the target has no convention violations, the skill shall report `no findings`.
- When the target is a known-good shipping file (e.g., `josh-ingester/josh_ingester/sources/crs_reports/parse.py` after the conventions-refactor), the skill shall report `no findings`.
- When the target is a synthetic violation file (committed under `tests/fixtures/code_quality/bad_*.py` for evaluation), the skill shall identify each seeded violation with the correct convention category.
- When invoked on a non-Python file (HTML, YAML), the skill shall skip Python-specific checks but still flag universal violations (raw_* fields in API response models).
- The skill shall live at `.claude/skills/josh-code-quality/SKILL.md` and be discoverable via the standard skill-listing mechanism (visible in the user's available-skills list when working in this repo).
Success determiner
Command
set -e
# 1. Skill file exists at the expected path.
test -f .claude/skills/josh-code-quality/SKILL.md
# 2. Skill references the conventions doc as its authoritative source.
grep -q 'https://docs.usejosh.com/operations/conventions/' .claude/skills/josh-code-quality/SKILL.md
# 3. Skill defines the eight category checks the conventions doc enumerates.
for category in style async schema db source-layout errors logging tests; do
if ! grep -qi "$category" .claude/skills/josh-code-quality/SKILL.md; then
echo "ERROR: skill missing category: $category" >&2
exit 1
fi
done
# 4. Test fixtures exist for the manual eval.
test -d tests/fixtures/code_quality
test -f tests/fixtures/code_quality/good_example.py
ls tests/fixtures/code_quality/bad_*.py >/dev/null
Expect
The bash determiner verifies the skill is wired up correctly. The semantic correctness ("does the skill actually produce useful findings?") requires manual evaluation against the committed fixtures — see the plan section's "What good looks like vs what bad looks like" subsection for the eval procedure. Run that eval before flipping to `verified`.
Clarifications needed
- Should the skill auto-fix where possible (e.g., add `from __future__ import annotations`), or only report? Recommendation in the plan: report-only at v1; auto-fix is a separate v2 spec if appetite exists.
- Should this be a Claude Code skill (Markdown SKILL.md) only, or also packaged for Cursor/Codex? Recommendation: Claude Code first, port via AGENTS.md once the cross-tool spec lands.
Out of scope
- Auto-fixing violations (report-only at v1 — Ruff already auto-fixes its findings).
- Replacing `ruff check` or `pyright` (the skill checks the Josh-specific stuff Ruff/pyright can't enforce — schema rule, source-module layout, raw_* rule, etc.).
- PR-level review (covered by the separate `josh-pr-review-skill` spec).
Dependencies
Plan
Skill scope. The skill checks what general-purpose linters can't:
Josh-specific conventions. Ruff covers style/imports/pyupgrade. Pyright
covers types. The skill covers everything else in conventions.html:
the schema rule (no row models for ingest path), source-module layout,
the raw_* rule, the error-handling layers, structured logging shape,
test category coverage.
What "good" looks like. A file passes when:
- Imports are stdlib → third-party → first-party blocks (Ruff already
enforces, but the skill confirms).
- All pipeline functions are async def (or are pure helpers).
- Logging calls are log.<level>("dotted.event_name", key=value) shape.
- Error stringification is f"{type(exc).__name__}: {exc}".
- Loader code calls josh_substrate.db.upsert_with_children rather than
inline INSERT SQL.
- No Pydantic class mirrors a DB table outside a router file.
- In source modules: __init__.py exposes source, files match the
canonical layout (discover.py | fetch.py | parse.py | load.py).
- No raw_* field in any HTTP response model.
What "bad" looks like. A file fails when any of the above is violated.
Each finding cites the specific conventions.html section it derives from.
Findings format. Markdown report:
# Code-quality findings: <path>
## §3 Async-first
- WARN crs_reports/parse.py:18 — _normalize_bill_id is async def
but does no awaits. Consider sync. (conventions.html#async)
## §5 Database access
- FAIL crs_reports/load.py:96 — inline INSERT … ON CONFLICT SQL.
Use josh_substrate.db.upsert_with_children. (conventions.html#db-rule-3)
## §11 The raw_* rule
- (no findings)
## Verdict
1 FAIL, 1 WARN — fix FAILs before opening a PR.
Eval fixtures (committed). Under tests/fixtures/code_quality/:
- good_example.py — small file that passes every check. Skill reports
no findings.
- bad_db.py — has inline upsert SQL. Skill flags §5.
- bad_async.py — uses sync HTTP in an async pipeline stage. Skill flags §3.
- bad_logging.py — uses f-string log messages. Skill flags §8.
- bad_raw.py — exports a Pydantic model with raw_json field. Skill flags §11.
- bad_layout.py — file mis-named for the source-module layout. Skill flags §6.
Eval procedure: run the skill against each fixture, confirm findings match
the seeded violations. Document the eval transcript undertests/fixtures/code_quality/eval-transcript.md for posterity.
Maintenance. When conventions.html changes, update the skill in the
same commit. The skill's SKILL.md should reference specific anchors in the
conventions doc (#async, #db-rule-3, #raw) so updates are traceable.
Tasks
0 of 5 done.
- t1 Draft `.claude/skills/josh-code-quality/SKILL.md` with trigger phrases, scope, and the eight category checks.
- t2 Author the six fixture files under `tests/fixtures/code_quality/` (one good, five bad — one per category likely to be violated).
- t3 Manually invoke the skill against each fixture; tune the SKILL.md until findings match seeded violations.
- t4 Document the eval transcript at `tests/fixtures/code_quality/eval-transcript.md`.
- t5 Update `https://docs.usejosh.com/operations/conventions/` to reference the skill in §12 (Pre-PR checklist).
Changelog
No history yet.