AxProver¶
The AxProver prover is the AgentProver on the
AxProverHarness — it drives
ax-prover-base, a self-contained
LangGraph Lean agent with its own proposer → builder → reviewer → memory loop, via the
ax-prover prove CLI in a sandbox. ax-prover edits the target .lean file in place;
AgentProver supplies the staging, snapshot /
diff, key forwarding, and the shared Verifier remains
the source of truth for the compile / sorry / axiom check (we do not trust ax-prover’s
own reviewer). See Provers for the lifecycle every agent harness shares.
Usage¶
from open_atp.backends.docker import DockerBackend, DockerConfig
from open_atp.images import DEFAULT_IMAGE
from open_atp.provers import AgentProver, AgentProverConfig
backend = DockerBackend(DockerConfig(image=DEFAULT_IMAGE))
config = AgentProverConfig(
harness="axprover",
model="claude-opus-4-8",
effort="high",
max_iterations=None, # cap ax-prover's loop (its own default is 50)
)
prover = AgentProver(config, verification_backend=backend)
Or by registry spec through get_prover() / the CLI:
agent:axprover (defaults: claude-opus-4-8, effort="high"). The
max_iterations field is AxProver-specific and caps ax-prover’s
proposer → builder → reviewer loop.
Harness details¶
Two things differ from the CLI harnesses:
Config lives in a workdir YAML, not flags.
configure_wdwrites anaxprover.yaml(emitted as JSON, which is valid YAML) selecting the model, effort, and optionalmax_iterations. ax-prover’s--configappends to its bundleddefault.yaml, so only the deltas are set: the model is defined under a freshllm_configs.open_atpkey thatprover.prover_llmreferences by interpolation (a fresh key avoids OmegaConf deep-merging stalethinkingsettings from the default’s Claude config into ours).The free-text prompt is ignored — ax-prover ships its own prompts. The launch script (
assets/scripts/axprover_agent.sh) self-discovers every.leancarrying asorry(skipping the warm.lakecache) and proves each:ax-prover --config axprover.yaml prove "$f" \ --folder . --skip-build --overwrite \ -o "ax_output.<target>.json" 2>&1 | tee "ax_prover.<target>.log" || true
|| truekeeps one unprovable file from aborting the rest; the final Verifier pass is authoritative either way.
The model is mapped to ax-prover’s LangChain provider:model form (e.g.
anthropic:claude-opus-4-8, google_genai:...), and effort maps to each provider’s
reasoning knob. ax-prover’s per-call LLM retries are capped at 3 (its default of
10 000 ignores non-retryable 400s and can hang a run for hours).
Authentication¶
The harness forwards one of ANTHROPIC_API_KEY, OPENAI_API_KEY, or
GOOGLE_API_KEY from the host (ax-prover reads the provider key from the process
env). At least one must be set or the harness raises.
Cost tracking¶
ax-prover streams human-readable logs, not a JSON event stream, so cost is not on
stdout. parse sums input_tokens / output_tokens across the per-target
ax_output.<target>.json files written by the -o flag and leaves cost_usd None,
so the prover converts tokens → USD via COST_PER_MTOK
(as with Codex). collect_logs relocates the ax_output.*.json and
ax_prover.*.log files to logs/. On an ax-prover build without the usage fields the
tokens are simply absent and the run reports zero cost.