provers

The concrete provers and the registry/factory over them. Each prover subclasses AutomatedProver and funnels its output through the shared Verifier. The agentic provers compose an agent harness (the agent concern) with a ComputeBackend (the compute concern).

Registry

The open_atp.provers registry maps a prover name to a constructed AutomatedProver, wired to a shared sandbox image/toolchain and compute backend. A caller then drives the prover directly via prove(), which returns a ProofResult with verification and cost. This is the top-level surface re-exported from open_atp itself.

PROVERS enumerates the available provers (e.g. PROVERS.CLAUDE, PROVERS.CODEX, PROVERS.ARISTOTLE). get_prover() maps a member (or its string value) to a constructed AutomatedProver, wiring in the shared image/toolchain, the verify backend, and (for agentic provers) the agent backend.

class open_atp.provers.PROVERS(*values)[source]

The provers get_prover() accepts.

Each member’s value is the registry key. agent:<harness> members select an AgentProver harness; CLAUDE is the bare agent (config default, claude_code).

ARISTOTLE = 'aristotle'
AXPROVER = 'agent:axprover'
CLAUDE = 'agent'
CODEX = 'agent:codex'
NUMINA = 'numina'
OPENCODE = 'agent:opencode'
VIBE = 'vibe'
open_atp.provers.get_prover(name, *, verification_backend, agent_backend=None, overrides=None)[source]

Construct the prover name against the shared verify backend.

name is a PROVERS member (or its string value). The config is built from the name’s baked-in defaults + caller overrides (per-prover knobs: model, effort, max_rounds, …). The sandbox image (and the toolchain + Mathlib pins projects are checked against) comes from verification_backend’s config, not a parameter here. Agentic provers also receive agent_backend for generation (defaults to the verify backend), keeping the agent-vs-verify backend split available.

Parameters:
namePROVERS or str

The prover to build – a PROVERS member or its registry-key string (e.g. PROVERS.CLAUDE or "agent"). Raises ValueError for an unknown name.

verification_backendComputeBackend

The backend that runs the shared Verifier (the cheap final compile/sorry/axiom check).

agent_backendComputeBackend, optional

The backend agentic provers run generation on. Defaults to verification_backend, so generation and verification share a backend unless you split them (e.g. generate on Modal, verify on local Docker). Ignored by non-agentic provers (e.g. AristotleProver).

overridesMapping[str, object], optional

Per-prover config knobs layered over the name’s baked-in defaults (model, effort, max_rounds, …). Keys must be fields of the prover’s config class.

Returns:
proverAutomatedProver

The constructed prover, ready to drive via prove().

Examples

Construction is cheap and offline (the backend is wired in, not called), so the factory builds a ready-to-drive prover directly. overrides layer per-prover knobs over the name’s baked-in defaults:

>>> from open_atp import PROVERS, get_prover
>>> from open_atp.backends.docker import DockerBackend, DockerConfig
>>> from open_atp.images import DEFAULT_IMAGE
>>> backend = DockerBackend(DockerConfig(image=DEFAULT_IMAGE))
>>> prover = get_prover(
...     PROVERS.CLAUDE,
...     verification_backend=backend,
...     overrides={"model": "claude-sonnet-4-6", "effort": "low"},
... )
>>> prover.config.model
'claude-sonnet-4-6'

See each prover class for a family-specific construction example (AgentProver, NuminaProver, AristotleProver).

Drive a constructed prover with prove() (this step runs the sandbox, so it needs a working Docker backend):

from pathlib import Path
from open_atp.lean import LeanProject, ProofTask

task = ProofTask(project=LeanProject("path/to/lake/project"))
result = prover.prove(task, output_dir=Path("runs/demo"))
result.success      # compiles, sorry-free, no foreign axioms
result.cost_usd     # estimated USD, when the prover reports it
result.duration_s   # wall-clock seconds

See Running a prover for more example usage.

open_atp.provers.available_provers()[source]

The provers get_prover() accepts.

Base

The base prover abstraction. An AutomatedProver is a candidate generator; the base class owns the shared lifecycle (the public prove: generate, then verify in the sandbox) so subclasses only implement _generate.

class open_atp.provers.base.AutomatedProver(config, verification_backend)[source]

Generate candidate proofs, then verify them in a shared sandbox.

prove(task, output_dir)[source]

Full lifecycle: reject-on-mismatch, generate, verify, write the result.

output_dir is caller-chosen and is populated as output_dir/{wd,logs}/: wd is the completed lake project (the proof output) and logs is the run record (the agent stdout.txt/stderr.txt, result.json, and any harness-specific rich logs).

class open_atp.provers.base.AutomatedProverConfig(timeout_s=1800, env=<factory>)[source]

Base configuration shared by all provers.

Subclasses extend with their own knobs:

The sandbox image (its tag plus the Lean toolchain + Mathlib pins the shared verifier checks every project against) lives on the backend’s BackendConfig, not here – a prover inherits whatever image its verification backend runs.

Attributes:
timeout_sint

Wall-clock budget for the generation run, in seconds. Default 1800.

envdict[str, str]

Extra environment variables exported into the run. Default empty.

classmethod from_dict(data)[source]

Build a config from a mapping (e.g. parsed JSON), ignoring unknown keys.

The inverse of dataclasses.asdict(): any tuple-typed field is restored from the list JSON round-trips it to. Unknown keys are dropped so a serialized superset (or a sibling subclass’s extra knobs) loads cleanly. Subclasses inherit this unchanged – cls resolves to the concrete config, so its own fields are honoured.

AgentProver

class open_atp.provers.agent_prover.AgentProver(config, verification_backend, agent_backend=None)[source]

Bases: AutomatedProver

Generate proofs by driving an agent CLI harness in a compute backend.

Composes an agent harness (the agent concern) with a ComputeBackend (the compute concern): the harness edits the staged .lean files in place, then the shared Verifier does the final compile/sorry/axiom check. codex, opencode, axprover, and vibe are this prover on a different harness.

Examples

Build the config and construct the prover directly (the agent backend defaults to the verify backend):

>>> from open_atp.backends.docker import DockerBackend, DockerConfig
>>> from open_atp.provers.agent_prover import AgentProver, AgentProverConfig
>>> backend = DockerBackend(DockerConfig())
>>> config = AgentProverConfig(
...     harness="claude_code",
...     model="claude-opus-4-8",
...     effort="high",
... )
>>> prover = AgentProver(config, verification_backend=backend)
>>> prover.config.harness
'claude_code'
config
class open_atp.provers.agent_prover.AgentProverConfig(timeout_s=1800, env=<factory>, harness='claude_code', model='claude-opus-4-8', effort='high', assets='default', skills=None, plugins=None, extra_env=<factory>, agent='lean', max_turns=None, max_price=None, max_iterations=None)[source]

Bases: AutomatedProverConfig

Configuration for AgentProver.

Extends AutomatedProverConfig (timeout_s, env) with the agent-harness knobs.

Attributes:
harnessstr

Which agent CLI to drive: one of claude_code | opencode | codex | vibe. Default claude_code.

modelstr

Model id passed to the harness. Default claude-opus-4-8.

effortstr

Reasoning-effort level passed to harnesses that support it. Default high.

assetsstr

Vendored skill/prompt/MCP asset bundle to mount into the workdir. Default default.

skillslist[str], optional

Per-run override of the bundle’s skills, each a name (resolved from a vendored catalog) or a full path. None keeps the bundle’s own skills; an empty list mounts none. Applies to every harness.

pluginslist[str], optional

Per-run override of the bundle’s plugins, same resolution as skills. Claude-only; ignored by the other harnesses.

extra_envdict[str, str]

Additional environment variables forwarded into the agent sandbox. Default empty.

agentstr

Vibe-only: which vibe agent profile to drive (lean is Leanstral; lean-standin the non-Labs, model-templated stand-in). Ignored by the other harnesses. Default lean.

max_turnsint, optional

Vibe-only programmatic-run guard passed straight to vibe -p. None leaves it unset. Ignored by the other harnesses.

max_pricefloat, optional

Vibe-only programmatic-run guard passed straight to vibe -p. None leaves it unset. Ignored by the other harnesses.

max_iterationsint, optional

AxProver-only cap on ax-prover’s proposer->builder->reviewer loop. None keeps ax-prover’s default (50). Ignored by the other harnesses.

NuminaProver

class open_atp.provers.numina.NuminaProver(config, verification_backend, agent_backend=None)[source]

Bases: AgentProver

Run the Numina coordinator/subagent scaffold as an AgentProver.

A specialization of AgentProver wired to the Numina asset bundle (coordinator prompt + vendored skills + subagent prompts); generation and the shared Verifier work exactly as in the base agent prover.

Examples

Build the config and construct the prover directly:

>>> from open_atp.backends.docker import DockerBackend, DockerConfig
>>> from open_atp.provers.numina import NuminaProver, NuminaProverConfig
>>> backend = DockerBackend(DockerConfig())
>>> config = NuminaProverConfig()
>>> prover = NuminaProver(config, verification_backend=backend)
>>> prover.config.max_rounds
20
config
class open_atp.provers.numina.NuminaProverConfig(timeout_s=1800, env=<factory>, harness='claude_code', model='claude-opus-4-8', effort='high', assets='numina', skills=None, plugins=None, extra_env=<factory>, agent='lean', max_turns=None, max_price=None, max_iterations=None, max_rounds=20, max_consecutive_limits=2, helper_env_keys=('GEMINI_API_KEY', 'OPENAI_API_KEY', 'LEAN_LEANDEX_API_KEY', 'ANTHROPIC_API_KEY'), guard_statements=True, on_statement_change='error')[source]

Bases: AgentProverConfig

Configuration for NuminaProver.

Extends AgentProverConfig with the Numina coordinator’s round-loop and helper-skill knobs.

Attributes:
harnessstr

Fixed to claude_code: Numina is claude-CLI driven and not configurable.

assetsstr

Asset bundle to mount. Default numina (coordinator prompt + vendored skills + subagent prompts).

max_roundsint

Maximum number of coordinator rounds before the run stops. Default 20.

max_consecutive_limitsint

Reset (start a fresh session) after this many consecutive LIMIT rounds. Default 2.

helper_env_keystuple[str, …]

Helper-skill credentials forwarded into the sandbox when present in the host env; skills degrade/skip when their key is absent. ANTHROPIC_API_KEY backs the informal-prover skill’s Claude calls.

guard_statementsbool

Whether to snapshot the target theorems and reject runs that weaken or delete them. Default True.

on_statement_change{“error”, “warn”}

Behavior on a weakened/deleted target theorem: error stops the run and restores the originals; warn restores and continues. Default error (rejects; safe).

extra_envdict[str, str]

Additional environment variables forwarded into the agent sandbox. Default empty.

AristotleProver

class open_atp.provers.aristotle.AristotleProver(config, verification_backend)[source]

Bases: AutomatedProver

Prove by handing the whole project to Harmonic’s hosted Aristotle agent.

Generation happens over the network (submit the lake project, wait, download the result archive, unpack it over the workdir); the shared Verifier then runs the same local compile/sorry/axiom check. Network-only, so it takes just the verify backend – no agent_backend.

Examples

Build the config and construct the prover directly (network-only, so it takes just the verify backend):

>>> from open_atp.backends.docker import DockerBackend, DockerConfig
>>> from open_atp.provers.aristotle import AristotleProver, AristotleProverConfig
>>> backend = DockerBackend(DockerConfig())
>>> config = AristotleProverConfig()
>>> prover = AristotleProver(config, verification_backend=backend)
>>> prover.config.api_key_env
'ARISTOTLE_API_KEY'
config
class open_atp.provers.aristotle.AristotleProverConfig(timeout_s=1800, env=<factory>, api_key_env='ARISTOTLE_API_KEY', allow_agent_questions=False, max_connection_retries=5, max_resume_attempts=20, resume_backoff_seconds=5.0)[source]

Bases: AutomatedProverConfig

Configuration for AristotleProver.

Extends AutomatedProverConfig (timeout_s, env) with the hosted-API knobs.

Attributes:
api_key_envstr

Name of the environment variable holding the Harmonic API key. Default ARISTOTLE_API_KEY.

allow_agent_questionsbool

Whether to let the hosted agent ask clarifying questions. Off by default: this is a headless API path and a prompt for stdin would hang the run.

max_connection_retriesint

Bounds per-call retries (list/refresh/download) when a connection drops. The hosted run lives server-side, so a dropped connection is recoverable: re-fetch rather than reporting the run failed. Default 5.

max_resume_attemptsint

Bounds how many times we re-attach to the event stream when it drops mid-run. Default 20.

resume_backoff_secondsfloat

Initial sleep between retries/resumes, doubling (capped) between tries. Default 5.0.