`provers`¶

The concrete provers and the registry/factory over them. Each prover subclasses AutomatedProver and funnels its output through the shared Verifier. The agentic provers compose an agent harness (the agent concern) with a ComputeBackend (the compute concern).

Registry¶

The open_atp.provers registry maps a prover name to a constructed AutomatedProver, wired to a shared sandbox image/toolchain and compute backend. A caller then drives the prover directly via prove(), which returns a ProofResult with verification and cost. This is the top-level surface re-exported from open_atp itself.

PROVERS enumerates the available provers (e.g. PROVERS.CLAUDE, PROVERS.CODEX, PROVERS.ARISTOTLE). get_prover() maps a member (or its string value) to a constructed AutomatedProver, wiring in the shared image/toolchain, the verify backend, and (for agentic provers) the agent backend.

class open_atp.provers.PROVERS(*values)[source]¶

The provers get_prover() accepts.

Each member’s value is the registry key. agent:<harness> members select an AgentProver harness; CLAUDE is the bare agent (config default, claude_code).

ARISTOTLE = 'aristotle'¶

AXPROVER = 'agent:axprover'¶

CLAUDE = 'agent'¶

CODEX = 'agent:codex'¶

NUMINA = 'numina'¶

OPENCODE = 'agent:opencode'¶

VIBE = 'vibe'¶

open_atp.provers.get_prover(name, *, verification_backend, agent_backend=None, overrides=None)[source]¶

Construct the prover name against the shared verify backend.

name is a PROVERS member (or its string value). The config is built from the name’s baked-in defaults + caller overrides (per-prover knobs: model, effort, max_rounds, …). The sandbox image (and the toolchain + Mathlib pins projects are checked against) comes from verification_backend’s config, not a parameter here. Agentic provers also receive agent_backend for generation (defaults to the verify backend), keeping the agent-vs-verify backend split available.

Parameters:

namePROVERS or str: The prover to build – a PROVERS member or its registry-key string (e.g. PROVERS.CLAUDE or "agent"). Raises ValueError for an unknown name.
verification_backendComputeBackend: The backend that runs the shared Verifier (the cheap final compile/sorry/axiom check).
agent_backendComputeBackend, optional: The backend agentic provers run generation on. Defaults to verification_backend, so generation and verification share a backend unless you split them (e.g. generate on Modal, verify on local Docker). Ignored by non-agentic provers (e.g. AristotleProver).
overridesMapping[str, object], optional: Per-prover config knobs layered over the name’s baked-in defaults (model, effort, max_rounds, …). Keys must be fields of the prover’s config class.

Returns:

proverAutomatedProver: The constructed prover, ready to drive via prove().

Examples

Construction is cheap and offline (the backend is wired in, not called), so the factory builds a ready-to-drive prover directly. overrides layer per-prover knobs over the name’s baked-in defaults:

>>> from open_atp import PROVERS, get_prover
>>> from open_atp.backends.docker import DockerBackend, DockerConfig
>>> from open_atp.images import DEFAULT_IMAGE
>>> backend = DockerBackend(DockerConfig(image=DEFAULT_IMAGE))
>>> prover = get_prover(
...     PROVERS.CLAUDE,
...     verification_backend=backend,
...     overrides={"model": "claude-sonnet-4-6", "effort": "low"},
... )
>>> prover.config.model
'claude-sonnet-4-6'

See each prover class for a family-specific construction example (AgentProver, NuminaProver, AristotleProver).

Drive a constructed prover with prove() (this step runs the sandbox, so it needs a working Docker backend):

from pathlib import Path
from open_atp.lean import LeanProject, ProofTask

task = ProofTask(project=LeanProject("path/to/lake/project"))
result = prover.prove(task, output_dir=Path("runs/demo"))
result.success      # compiles, sorry-free, no foreign axioms
result.cost_usd     # estimated USD, when the prover reports it
result.duration_s   # wall-clock seconds

See Running a prover for more example usage.

open_atp.provers.available_provers()[source]¶: The provers get_prover() accepts.

Base¶

The base prover abstraction. An AutomatedProver is a candidate generator; the base class owns the shared lifecycle (the public prove: generate, then verify in the sandbox) so subclasses only implement _generate.

class open_atp.provers.base.AutomatedProver(config, verification_backend)[source]¶

Generate candidate proofs, then verify them in a shared sandbox.

prove(task, output_dir)[source]¶

Full lifecycle: reject-on-mismatch, generate, verify, write the result.

output_dir is caller-chosen and is populated as output_dir/{wd,logs}/: wd is the completed lake project (the proof output) and logs is the run record (the agent stdout.txt/stderr.txt, result.json, and any harness-specific rich logs).

class open_atp.provers.base.AutomatedProverConfig(timeout_s=1800, env=<factory>)[source]¶

Base configuration shared by all provers.

Subclasses extend with their own knobs:

AgentProverConfig: harness (claude/opencode/codex), effort, skills, MCP.
NuminaProverConfig: extends the agent config + max_rounds, helper API keys.
AristotleProverConfig: api key, mode, poll interval.

The sandbox image (its tag plus the Lean toolchain + Mathlib pins the shared verifier checks every project against) lives on the backend’s BackendConfig, not here – a prover inherits whatever image its verification backend runs.

Attributes:

timeout_sint: Wall-clock budget for the generation run, in seconds. Default 1800.
envdict[str, str]: Extra environment variables exported into the run. Default empty.

classmethod from_dict(data)[source]¶

Build a config from a mapping (e.g. parsed JSON), ignoring unknown keys.

The inverse of dataclasses.asdict(): any tuple-typed field is restored from the list JSON round-trips it to. Unknown keys are dropped so a serialized superset (or a sibling subclass’s extra knobs) loads cleanly. Subclasses inherit this unchanged – cls resolves to the concrete config, so its own fields are honoured.

AgentProver¶

class open_atp.provers.agent_prover.AgentProver(config, verification_backend, agent_backend=None)[source]¶

Bases: AutomatedProver

Generate proofs by driving an agent CLI harness in a compute backend.

Composes an agent harness (the agent concern) with a ComputeBackend (the compute concern): the harness edits the staged .lean files in place, then the shared Verifier does the final compile/sorry/axiom check. codex, opencode, axprover, and vibe are this prover on a different harness.

Examples

Build the config and construct the prover directly (the agent backend defaults to the verify backend):

>>> from open_atp.backends.docker import DockerBackend, DockerConfig
>>> from open_atp.provers.agent_prover import AgentProver, AgentProverConfig
>>> backend = DockerBackend(DockerConfig())
>>> config = AgentProverConfig(
...     harness="claude_code",
...     model="claude-opus-4-8",
...     effort="high",
... )
>>> prover = AgentProver(config, verification_backend=backend)
>>> prover.config.harness
'claude_code'

config¶

class open_atp.provers.agent_prover.AgentProverConfig(timeout_s=1800, env=<factory>, harness='claude_code', model='claude-opus-4-8', effort='high', assets='default', skills=None, plugins=None, extra_env=<factory>, agent='lean', max_turns=None, max_price=None, max_iterations=None)[source]¶

Bases: AutomatedProverConfig

Configuration for AgentProver.

Extends AutomatedProverConfig (timeout_s, env) with the agent-harness knobs.

Attributes:

harnessstr: Which agent CLI to drive: one of claude_code | opencode | codex | vibe. Default claude_code.
modelstr: Model id passed to the harness. Default claude-opus-4-8.
effortstr: Reasoning-effort level passed to harnesses that support it. Default high.
assetsstr: Vendored skill/prompt/MCP asset bundle to mount into the workdir. Default default.
skillslist[str], optional: Per-run override of the bundle’s skills, each a name (resolved from a vendored catalog) or a full path. None keeps the bundle’s own skills; an empty list mounts none. Applies to every harness.
pluginslist[str], optional: Per-run override of the bundle’s plugins, same resolution as skills. Claude-only; ignored by the other harnesses.
extra_envdict[str, str]: Additional environment variables forwarded into the agent sandbox. Default empty.
agentstr: Vibe-only: which vibe agent profile to drive (lean is Leanstral; lean-standin the non-Labs, model-templated stand-in). Ignored by the other harnesses. Default lean.
max_turnsint, optional: Vibe-only programmatic-run guard passed straight to vibe -p. None leaves it unset. Ignored by the other harnesses.
max_pricefloat, optional: Vibe-only programmatic-run guard passed straight to vibe -p. None leaves it unset. Ignored by the other harnesses.
max_iterationsint, optional: AxProver-only cap on ax-prover’s proposer->builder->reviewer loop. None keeps ax-prover’s default (50). Ignored by the other harnesses.

NuminaProver¶

class open_atp.provers.numina.NuminaProver(config, verification_backend, agent_backend=None)[source]¶

Bases: AgentProver

Run the Numina coordinator/subagent scaffold as an AgentProver.

A specialization of AgentProver wired to the Numina asset bundle (coordinator prompt + vendored skills + subagent prompts); generation and the shared Verifier work exactly as in the base agent prover.

Examples

Build the config and construct the prover directly:

>>> from open_atp.backends.docker import DockerBackend, DockerConfig
>>> from open_atp.provers.numina import NuminaProver, NuminaProverConfig
>>> backend = DockerBackend(DockerConfig())
>>> config = NuminaProverConfig()
>>> prover = NuminaProver(config, verification_backend=backend)
>>> prover.config.max_rounds
20

config¶

class open_atp.provers.numina.NuminaProverConfig(timeout_s=1800, env=<factory>, harness='claude_code', model='claude-opus-4-8', effort='high', assets='numina', skills=None, plugins=None, extra_env=<factory>, agent='lean', max_turns=None, max_price=None, max_iterations=None, max_rounds=20, max_consecutive_limits=2, helper_env_keys=('GEMINI_API_KEY', 'OPENAI_API_KEY', 'LEAN_LEANDEX_API_KEY', 'ANTHROPIC_API_KEY'), guard_statements=True, on_statement_change='error')[source]¶

Bases: AgentProverConfig

Configuration for NuminaProver.

Extends AgentProverConfig with the Numina coordinator’s round-loop and helper-skill knobs.

Attributes:

harnessstr: Fixed to claude_code: Numina is claude-CLI driven and not configurable.
assetsstr: Asset bundle to mount. Default numina (coordinator prompt + vendored skills + subagent prompts).
max_roundsint: Maximum number of coordinator rounds before the run stops. Default 20.
max_consecutive_limitsint: Reset (start a fresh session) after this many consecutive LIMIT rounds. Default 2.
helper_env_keystuple[str, …]: Helper-skill credentials forwarded into the sandbox when present in the host env; skills degrade/skip when their key is absent. ANTHROPIC_API_KEY backs the informal-prover skill’s Claude calls.
guard_statementsbool: Whether to snapshot the target theorems and reject runs that weaken or delete them. Default True.
on_statement_change{“error”, “warn”}: Behavior on a weakened/deleted target theorem: error stops the run and restores the originals; warn restores and continues. Default error (rejects; safe).
extra_envdict[str, str]: Additional environment variables forwarded into the agent sandbox. Default empty.

AristotleProver¶

class open_atp.provers.aristotle.AristotleProver(config, verification_backend)[source]¶

Bases: AutomatedProver

Prove by handing the whole project to Harmonic’s hosted Aristotle agent.

Generation happens over the network (submit the lake project, wait, download the result archive, unpack it over the workdir); the shared Verifier then runs the same local compile/sorry/axiom check. Network-only, so it takes just the verify backend – no agent_backend.

Examples

Build the config and construct the prover directly (network-only, so it takes just the verify backend):

>>> from open_atp.backends.docker import DockerBackend, DockerConfig
>>> from open_atp.provers.aristotle import AristotleProver, AristotleProverConfig
>>> backend = DockerBackend(DockerConfig())
>>> config = AristotleProverConfig()
>>> prover = AristotleProver(config, verification_backend=backend)
>>> prover.config.api_key_env
'ARISTOTLE_API_KEY'

config¶

class open_atp.provers.aristotle.AristotleProverConfig(timeout_s=1800, env=<factory>, api_key_env='ARISTOTLE_API_KEY', allow_agent_questions=False, max_connection_retries=5, max_resume_attempts=20, resume_backoff_seconds=5.0)[source]¶

Bases: AutomatedProverConfig

Configuration for AristotleProver.

Extends AutomatedProverConfig (timeout_s, env) with the hosted-API knobs.

Attributes:

api_key_envstr: Name of the environment variable holding the Harmonic API key. Default ARISTOTLE_API_KEY.
allow_agent_questionsbool: Whether to let the hosted agent ask clarifying questions. Off by default: this is a headless API path and a prompt for stdin would hang the run.
max_connection_retriesint: Bounds per-call retries (list/refresh/download) when a connection drops. The hosted run lives server-side, so a dropped connection is recoverable: re-fetch rather than reporting the run failed. Default 5.
max_resume_attemptsint: Bounds how many times we re-attach to the event stream when it drops mid-run. Default 20.
resume_backoff_secondsfloat: Initial sleep between retries/resumes, doubling (capped) between tries. Default 5.0.

provers¶

Registry¶

Base¶

AgentProver¶

NuminaProver¶

AristotleProver¶

`provers`¶