AxProver¶
The AxProver prover is the AgentProver on the
AxProverHarness — it drives
ax-prover-base, a self-contained
LangGraph Lean agent with its own proposer → builder → reviewer → memory loop, via the
ax-prover prove CLI in a sandbox. ax-prover edits the target .lean file in place;
AgentProver supplies the staging, snapshot /
diff, key forwarding, and the shared Verifier remains
the source of truth for the compile / sorry / axiom check (we do not trust ax-prover’s
own reviewer). See Provers for the lifecycle every agent harness shares.
Usage¶
from open_atp.backends.docker import DockerBackend
from open_atp.harness import AxProverHarness
from open_atp.images import DEFAULT_IMAGE
from open_atp.provers import AgentProver
backend = DockerBackend(image=DEFAULT_IMAGE)
prover = AgentProver(
harness=AxProverHarness(
model="claude-opus-4-8",
effort="high",
max_iterations=None, # cap ax-prover's loop (its own default is 50)
),
backend=backend,
)
AxProverHarness selects the ax-prover harness and
carries the AxProver-specific max_iterations knob (which lives only on this harness,
not the shared base).
Or by catalog name through standard_prover() / the CLI:
agent:axprover (defaults: claude-opus-4-8, effort="high"). The
max_iterations field caps ax-prover’s proposer → builder → reviewer loop.
Harness details¶
Two things differ from the CLI harnesses:
Config lives in a workdir YAML, not flags.
stagewrites anaxprover.yaml(emitted as JSON, which is valid YAML) selecting the model, effort, and optionalmax_iterations. ax-prover’s--configappends to its bundleddefault.yaml, so only the deltas are set: the model is defined under a freshllm_configs.open_atpkey thatprover.prover_llmreferences by interpolation (a fresh key avoids OmegaConf deep-merging stalethinkingsettings from the default’s Claude config into ours).The free-text prompt is ignored — ax-prover ships its own prompts. The launch script (
assets/scripts/axprover_agent.sh) self-discovers every.leancarrying asorry(skipping the warm.lakecache) and proves each:ax-prover --config axprover.yaml prove "$f" \ --folder . --skip-build --overwrite \ -o "ax_output.<target>.json" 2>&1 | tee "ax_prover.<target>.log" || true
|| truekeeps one unprovable file from aborting the rest; the final Verifier pass is authoritative either way.
The model is mapped to ax-prover’s LangChain provider:model form (e.g.
anthropic:claude-opus-4-8, google_genai:...), and effort maps to each provider’s
reasoning knob. ax-prover’s per-call LLM retries are capped at 3 (its default of
10 000 ignores non-retryable 400s and can hang a run for hours).
Authentication¶
Pass the provider key to the harness explicitly:
AxProverHarness(provider_api_key="sk-...")
or leave provider_api_key unset (the default) to read it from the host environment.
The provider is inferred from the model prefix, and the harness forwards the key into
the sandbox under its canonical env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, or
GOOGLE_API_KEY), where ax-prover reads it from the process env. Resolution fails if
neither the explicit key nor the host env var is set.
Cost tracking¶
ax-prover streams human-readable logs, not a JSON event stream, so cost is not on
stdout. parse sums input_tokens / output_tokens across the per-target
ax_output.<target>.json files written by the -o flag and leaves cost_usd None,
so the prover converts tokens → USD via COST_PER_MTOK
(as with Codex). collect_logs relocates the ax_output.*.json and
ax_prover.*.log files to logs/. On an ax-prover build without the usage fields the
tokens are simply absent and the run reports zero cost.