Claude Code¶

The Claude Code prover is the AgentProver on the ClaudeCodeHarness — Anthropic’s Claude Code CLI driving the sorrys in a sandbox with the lean-lsp-mcp server. It is the default harness: the bare agent registry spec selects it, and the shared Verifier does the final compile / sorry / axiom check. See Provers for the staging/diff lifecycle every agent harness shares.

Usage¶

from open_atp.backends.docker import DockerBackend
from open_atp.harness import ClaudeCodeHarness
from open_atp.images import DEFAULT_IMAGE
from open_atp.provers import AgentProver

backend = DockerBackend(image=DEFAULT_IMAGE)
prover = AgentProver(
    harness=ClaudeCodeHarness(model="claude-opus-4-8", effort="high"),
    backend=backend,
)

AgentProver composes a Harness; here a ClaudeCodeHarness selects the Claude Code CLI and carries its model/effort. Swap in any other harness to change CLI. Claude Code is the default, so a bare AgentProver(backend=...) already uses it.

Or by catalog name through standard_prover() / the CLI: agent:claude.

Harness details¶

stage writes a project-scope .mcp.json registering the lean-lsp MCP server. The prover then stages the AgentProver’s skills — the host-agnostic leanprover/skills, default lean-proof — under .claude/skills/ (via stage_skills). Claude Code is the only harness that also loads plugins, so they live on ClaudeCodeHarness’s plugins (default lean4, vendored from cameronfreer/lean4-skills) rather than the shared bundle; each is staged under .plugins/<name>/. The launch script (assets/scripts/claude_code_agent.sh) runs:

claude -p "$PROMPT" \
    --output-format stream-json --verbose \
    --permission-mode bypassPermissions \
    --mcp-config .mcp.json --strict-mcp-config \
    --model '<MODEL>' --effort '<EFFORT>'<PLUGIN_FLAGS>

<PLUGIN_FLAGS> expands to one --plugin-dir .plugins/<name> per mounted plugin — the only way to load a local plugin in a headless -p run, so its SessionStart hooks and subagents fire. bypassPermissions skips approval prompts (safe in the container); the prover sets IS_SANDBOX=1 so that mode runs non-interactively, and CLAUDE_CODE_FORK_SUBAGENT=1 when plugins are mounted. The stream-json event stream goes to stdout.

$PROMPT is the shared agent prover prompt baked into the AgentProver, with the task’s optional user_prompt appended under an Additional instructions heading when set:

Agent prover prompt

The working directory is a complete Lean 4 lake project. One or more `.lean`
files contain `sorry` (or `admit`) placeholders standing in for proofs that have
not been written yet. Replace every such placeholder with a real proof so the
project compiles cleanly and depends on no axioms beyond Lean's standard set.

Hard rules:
- Do not weaken, rename, restate, or delete any theorem, lemma, `def`,
  `structure`, or signature. Only fill in proof bodies (the part after `:=` /
  `by` that is currently `sorry`). Changing a statement to make it easier to
  prove is failure, not success.
- No new axioms and no `sorry`/`admit`/`native_decide`-on-false escapes. The
  finished proof must type-check honestly. The only acceptable axioms are Lean's
  standard `propext`, `Classical.choice`, and `Quot.sound`.
- Stay inside this working directory; do not read or write files outside it.
- Do not edit `lakefile.toml`/`lakefile.lean`, `lean-toolchain`, or
  `lake-manifest.json` — they pin the toolchain and dependencies and must match
  the verification environment.

Workflow:
1. Find the work: search for `sorry` across the `.lean` source files (e.g.
   `rg -n '\\bsorry\\b'`). Read each file containing one to understand the
   statement, the hypotheses, and the relevant imports.
2. Confirm the lean-lsp MCP server is live before relying on it: call
   `mcp__lean-lsp__lean_diagnostic_messages` on a file you have not yet edited.
   `success:true, items:[]` means it compiles cleanly; real errors come back as
   `items`. `success:false, items:[]` usually means imports aren't built yet —
   run `lake build` for the relevant modules first.
3. Write a proof for one `sorry` at a time. Mathlib is available; prefer library
   lemmas, `simp`, `omega`, `linarith`, `exact?`/`apply?` suggestions, and
   `aesop` over long bespoke arguments.
4. After each edit, re-check that file with
   `mcp__lean-lsp__lean_diagnostic_messages` and iterate until it is clean.
5. When a file looks done, verify it has no stubbed proofs with
   `mcp__lean-lsp__lean_verify` — the reported axioms must NOT contain `sorryAx`.
6. Repeat until no `.lean` file contains a `sorry` and the whole project builds
   (`lake build`).

Tips:
- Use the lean-lsp tools (`mcp__lean-lsp__*`) as your primary feedback loop; they
  are far faster than a full `lake build` per change. Use `lake build` to
  materialize oleans for imports and as the final whole-project check.
- If a goal looks false or unprovable from the given hypotheses, re-read the
  statement: you likely misread a binder or a coercion. Do not "fix" it by
  changing the statement — finish the proof as stated.
- Non-trivial proofs routinely take many rounds of compile-error fixing. Keep
  iterating against the diagnostics rather than guessing."""

Authentication¶

Claude Code is included with every paid Claude plan — compare plans at Choose a Claude plan and monitor consumption at Usage. Generate a long-lived OAuth token once on the host:

claude setup-token

Pass the token to the harness explicitly:

ClaudeCodeHarness(oauth_token="sk-ant-oat01-...")

or leave oauth_token unset (the default) to read it from the host CLAUDE_CODE_OAUTH_TOKEN environment variable — for example from a .env file in your project:

CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-...

Either way the harness forwards CLAUDE_CODE_OAUTH_TOKEN into the sandbox at run time, billing against your Claude plan rather than the API; resolution fails if neither is supplied.

Cost tracking¶

The Claude Code CLI’s JSON event stream reports per-run USD directly (total_cost_usd in the final result object), so cost_usd in HarnessRunResult is read straight from the stream along with input/output token totals.