AxProver

The AxProver prover is the AgentProver on the AxProverHarness — it drives ax-prover-base, a self-contained LangGraph Lean agent with its own proposer → builder → reviewer → memory loop, via the ax-prover prove CLI in a sandbox. ax-prover edits the target .lean file in place; AgentProver supplies the staging, snapshot / diff, key forwarding, and the shared Verifier remains the source of truth for the compile / sorry / axiom check (we do not trust ax-prover’s own reviewer). See Provers for the lifecycle every agent harness shares.

Usage

from open_atp.backends.docker import DockerBackend
from open_atp.harness import AxProverHarness
from open_atp.images import DEFAULT_IMAGE
from open_atp.provers import AgentProver

backend = DockerBackend(image=DEFAULT_IMAGE)
prover = AgentProver(
    harness=AxProverHarness(
        model="claude-opus-4-8",
        effort="high",
        max_iterations=None,  # cap ax-prover's loop (its own default is 50)
    ),
    backend=backend,
)

AxProverHarness selects the ax-prover harness and carries the AxProver-specific max_iterations knob (which lives only on this harness, not the shared base).

Or by catalog name through standard_prover() / the CLI: agent:axprover (defaults: claude-opus-4-8, effort="high"). The max_iterations field caps ax-prover’s proposer → builder → reviewer loop.

Harness details

Two things differ from the CLI harnesses:

  • Config lives in a workdir YAML, not flags. stage writes an axprover.yaml (emitted as JSON, which is valid YAML) selecting the model, effort, and optional max_iterations. ax-prover’s --config appends to its bundled default.yaml, so only the deltas are set: the model is defined under a fresh llm_configs.open_atp key that prover.prover_llm references by interpolation (a fresh key avoids OmegaConf deep-merging stale thinking settings from the default’s Claude config into ours).

  • The free-text prompt is ignored — ax-prover ships its own prompts. The launch script (assets/scripts/axprover_agent.sh) self-discovers every .lean carrying a sorry (skipping the warm .lake cache) and proves each:

    ax-prover --config axprover.yaml prove "$f" \
        --folder . --skip-build --overwrite \
        -o "ax_output.<target>.json" 2>&1 | tee "ax_prover.<target>.log" || true
    

    || true keeps one unprovable file from aborting the rest; the final Verifier pass is authoritative either way.

The model is mapped to ax-prover’s LangChain provider:model form (e.g. anthropic:claude-opus-4-8, google_genai:...), and effort maps to each provider’s reasoning knob. ax-prover’s per-call LLM retries are capped at 3 (its default of 10 000 ignores non-retryable 400s and can hang a run for hours).

Authentication

Pass the provider key to the harness explicitly:

AxProverHarness(provider_api_key="sk-...")

or leave provider_api_key unset (the default) to read it from the host environment. The provider is inferred from the model prefix, and the harness forwards the key into the sandbox under its canonical env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY), where ax-prover reads it from the process env. Resolution fails if neither the explicit key nor the host env var is set.

Cost tracking

ax-prover streams human-readable logs, not a JSON event stream, so cost is not on stdout. parse sums input_tokens / output_tokens across the per-target ax_output.<target>.json files written by the -o flag and leaves cost_usd None, so the prover converts tokens → USD via COST_PER_MTOK (as with Codex). collect_logs relocates the ax_output.*.json and ax_prover.*.log files to logs/. On an ax-prover build without the usage fields the tokens are simply absent and the run reports zero cost.