launchdraftp0

Code-quality skill for Josh contributors

josh-code-quality-skill · updated 2026-05-10T12:00:00Z

Use the pencil to edit title, status, priority, and owner. Changing status auto-prepends a changelog entry.

AI agents (Claude Code, Cursor, Codex) writing or reading Josh code need a
consistent way to check files against https://docs.usejosh.com/operations/conventions/.
Without a skill, every agent re-derives "what good Josh code looks like"
from prose each session — which the CMU sugar-rush research
(https://civic.io/2026/03/16/ending-the-sugar-rush/) shows is unreliable
and produces drift over time. A skill compresses the conventions doc into
a checkable contract: invoke /josh-code-quality <path>, get a structured
report grouped by convention category. Same input → same output. Pairs with
the PR review skill (separate spec) which evaluates whole changes.

As a contributor finishing a code change, I want a one-command check that my files match the Josh conventions so that I catch drift before opening a PR rather than during code review.

As an AI agent, I want a skill that returns a structured findings report so that I can self-correct against codified standards rather than guessing what "good" looks like.

As a maintainer reviewing an unfamiliar source, I want a quick scan against the conventions so that I know where to focus my review.

  1. When `/josh-code-quality <path>` is invoked on a Python file or directory, the skill shall produce findings grouped by convention category (style, async, schema, db, source-layout, errors, logging, tests, raw_*) and reference the specific section of `conventions.html` for each finding.
  2. When the target has no convention violations, the skill shall report `no findings`.
  3. When the target is a known-good shipping file (e.g., `josh-ingester/josh_ingester/sources/crs_reports/parse.py` after the conventions-refactor), the skill shall report `no findings`.
  4. When the target is a synthetic violation file (committed under `tests/fixtures/code_quality/bad_*.py` for evaluation), the skill shall identify each seeded violation with the correct convention category.
  5. When invoked on a non-Python file (HTML, YAML), the skill shall skip Python-specific checks but still flag universal violations (raw_* fields in API response models).
  6. The skill shall live at `.claude/skills/josh-code-quality/SKILL.md` and be discoverable via the standard skill-listing mechanism (visible in the user's available-skills list when working in this repo).
kindbash

Command

set -e
# 1. Skill file exists at the expected path.
test -f .claude/skills/josh-code-quality/SKILL.md
# 2. Skill references the conventions doc as its authoritative source.
grep -q 'https://docs.usejosh.com/operations/conventions/' .claude/skills/josh-code-quality/SKILL.md
# 3. Skill defines the eight category checks the conventions doc enumerates.
for category in style async schema db source-layout errors logging tests; do
  if ! grep -qi "$category" .claude/skills/josh-code-quality/SKILL.md; then
    echo "ERROR: skill missing category: $category" >&2
    exit 1
  fi
done
# 4. Test fixtures exist for the manual eval.
test -d tests/fixtures/code_quality
test -f tests/fixtures/code_quality/good_example.py
ls tests/fixtures/code_quality/bad_*.py >/dev/null

Expect

exit 0 — skill file in place, references conventions doc, covers eight categories, fixtures committed

The bash determiner verifies the skill is wired up correctly. The semantic correctness ("does the skill actually produce useful findings?") requires manual evaluation against the committed fixtures — see the plan section's "What good looks like vs what bad looks like" subsection for the eval procedure. Run that eval before flipping to `verified`.

  • Should the skill auto-fix where possible (e.g., add `from __future__ import annotations`), or only report? Recommendation in the plan: report-only at v1; auto-fix is a separate v2 spec if appetite exists.
  • Should this be a Claude Code skill (Markdown SKILL.md) only, or also packaged for Cursor/Codex? Recommendation: Claude Code first, port via AGENTS.md once the cross-tool spec lands.
  • Auto-fixing violations (report-only at v1 — Ruff already auto-fixes its findings).
  • Replacing `ruff check` or `pyright` (the skill checks the Josh-specific stuff Ruff/pyright can't enforce — schema rule, source-module layout, raw_* rule, etc.).
  • PR-level review (covered by the separate `josh-pr-review-skill` spec).

Skill scope. The skill checks what general-purpose linters can't:
Josh-specific conventions. Ruff covers style/imports/pyupgrade. Pyright
covers types. The skill covers everything else in conventions.html:
the schema rule (no row models for ingest path), source-module layout,
the raw_* rule, the error-handling layers, structured logging shape,
test category coverage.

What "good" looks like. A file passes when:
- Imports are stdlib → third-party → first-party blocks (Ruff already
enforces, but the skill confirms).
- All pipeline functions are async def (or are pure helpers).
- Logging calls are log.<level>("dotted.event_name", key=value) shape.
- Error stringification is f"{type(exc).__name__}: {exc}".
- Loader code calls josh_substrate.db.upsert_with_children rather than
inline INSERT SQL.
- No Pydantic class mirrors a DB table outside a router file.
- In source modules: __init__.py exposes source, files match the
canonical layout (discover.py | fetch.py | parse.py | load.py).
- No raw_* field in any HTTP response model.

What "bad" looks like. A file fails when any of the above is violated.
Each finding cites the specific conventions.html section it derives from.

Findings format. Markdown report:

# Code-quality findings: <path>

## §3 Async-first
- WARN crs_reports/parse.py:18 — _normalize_bill_id is async def
but does no awaits. Consider sync. (conventions.html#async)

## §5 Database access
- FAIL crs_reports/load.py:96 — inline INSERT … ON CONFLICT SQL.
Use josh_substrate.db.upsert_with_children. (conventions.html#db-rule-3)

## §11 The raw_* rule
- (no findings)

## Verdict
1 FAIL, 1 WARN — fix FAILs before opening a PR.

Eval fixtures (committed). Under tests/fixtures/code_quality/:
- good_example.py — small file that passes every check. Skill reports
no findings.
- bad_db.py — has inline upsert SQL. Skill flags §5.
- bad_async.py — uses sync HTTP in an async pipeline stage. Skill flags §3.
- bad_logging.py — uses f-string log messages. Skill flags §8.
- bad_raw.py — exports a Pydantic model with raw_json field. Skill flags §11.
- bad_layout.py — file mis-named for the source-module layout. Skill flags §6.

Eval procedure: run the skill against each fixture, confirm findings match
the seeded violations. Document the eval transcript under
tests/fixtures/code_quality/eval-transcript.md for posterity.

Maintenance. When conventions.html changes, update the skill in the
same commit. The skill's SKILL.md should reference specific anchors in the
conventions doc (#async, #db-rule-3, #raw) so updates are traceable.

0 of 5 done.

  • t1 Draft `.claude/skills/josh-code-quality/SKILL.md` with trigger phrases, scope, and the eight category checks.
  • t2 Author the six fixture files under `tests/fixtures/code_quality/` (one good, five bad — one per category likely to be violated).
  • t3 Manually invoke the skill against each fixture; tune the SKILL.md until findings match seeded violations.
  • t4 Document the eval transcript at `tests/fixtures/code_quality/eval-transcript.md`.
  • t5 Update `https://docs.usejosh.com/operations/conventions/` to reference the skill in §12 (Pre-PR checklist).

No history yet.

docs/spec/josh-code-quality-skill.html · generated by bin/build-spec.py