`provers`¶

The concrete provers and the registry/factory over them. Each prover subclasses AutomatedProver and funnels its output through the shared Verifier. The agentic provers compose an agent harness (the agent concern) with a ComputeBackend (the compute concern).

Base¶

The base prover abstraction. An AutomatedProver is a candidate generator; the base class owns the shared lifecycle (the public prove: generate, then verify in the sandbox) so subclasses only implement _generate.

class open_atp.provers.base.AutomatedProver(*, backend: ComputeBackend, timeout_s: int = 1800)[source]¶

Generate candidate proofs, then verify them in a shared sandbox.

The sandbox image (its tag plus the Lean toolchain + Mathlib pins the shared verifier checks every project against) comes from backend – a prover inherits whatever image its backend runs.

Parameters:

backendComputeBackend: The one backend for this prover. Agentic provers reuse it (via a live session) for generation, then verify in that hot sandbox; Aristotle uses it only for the final check.
timeout_sint: Wall-clock budget for the generation run, in seconds. Default 1800.

prove(task: ProofTask, output_dir: Path | str) → ProofResult[source]¶

Full lifecycle: reject-on-mismatch, generate, verify, write the result.

Parameters:

taskProofTask: The unit of work: the lake project to complete, the optional targets to focus on, and any user_prompt guidance.
output_dirpathlib.Path or str: Caller-chosen output directory, populated as output_dir/{wd,logs}/: wd is the completed lake project (the proof output) and logs is the run record (the agent stdout.txt/stderr.txt, result.json, and any harness-specific rich logs).

Returns:

ProofResult: The verification verdict and run metadata, pointing at the populated wd and logs_dir.

class open_atp.provers.base.ProofResult(prover: str, verification: VerificationReport | None, output_dir: Path, completed_files: dict[str, str]=<factory>, cost_usd: float | None = None, duration_s: float | None = None, metadata: dict[str, object]=<factory>, error: str | None = None)[source]¶

What a prover returns from AutomatedProver.prove().

The prover writes its artifacts into the caller-chosen output_dir, laid out as output_dir/{wd,logs}/: wd is the completed lake project (the proof output) and logs is the run record (the streamed agent stdout.txt, stderr.txt, result.json, and any harness-specific rich logs). This object just records where those live, plus the verification verdict and run metadata.

Attributes:

proverstr: Name of the prover that produced this result.
verificationVerificationReport or None: The shared verification of the completed project, or None when the run failed before a candidate could be verified (see error).
output_dirpathlib.Path: The run’s output directory. Holds the wd (proof project) and logs_dir (run record) subdirectories the prover populated.
completed_filesdict[str, str]: The completed .lean sources, keyed by file path relative to the project root.
cost_usdfloat, optional: Estimated USD cost of the run. None when the prover does not report cost.
duration_sfloat, optional: Wall-clock duration of the run, in seconds.
metadatadict[str, object]: Harness-specific run metadata (token counts, run summaries, …).
errorstr, optional: Set when the prover raised before producing a result (Docker down, API error, toolchain mismatch). verification is None and success is False.
wdpathlib.Path: The completed proof project (output_dir/wd).
logs_dirpathlib.Path: The run’s logs directory (output_dir/logs).
successbool: True iff verification exists and is verified.

to_dict() → dict[str, object][source]¶: JSON-ready view: inline files, verification, cost, and artifact paths.

Provers¶

The concrete candidate generators.

AgentProver¶

class open_atp.provers.agent_prover.AgentProver(*, backend: ComputeBackend, harness: Harness | None = None, skills: list[str] | None = None, timeout_s: int = 1800)[source]¶

Bases: AutomatedProver

Generate proofs by driving an agent CLI harness in a compute backend.

Composes an agent harness (the agent concern) with a ComputeBackend (the compute concern): the harness edits the staged .lean files in place, then the shared Verifier does the final compile/sorry/axiom check. codex, opencode, axprover, and vibe are this prover on a different harness.

Parameters:

backendComputeBackend: The sandbox the agent runs in. Generation reuses it via a live session and verification runs in that same hot sandbox.
harnessHarness, optional: The harness to drive and its knobs: ClaudeCodeHarness (default), CodexHarness, OpenCodeHarness, VibeHarness, or AxProverHarness. Carries model/effort plus any harness-specific knobs. Plugins are Claude-only and live on ClaudeCodeHarness.
skillslist[str], optional: Skills to mount into the agent workdir, each a name (resolved from the vendored leanprover/skills catalog) or a full path to a SKILL.md tree. Default ["lean-proof"]; an empty list mounts none. Staged into every skill-supporting harness’s location; ignored by ax-prover.
timeout_sint: Wall-clock budget for the generation run, in seconds. Default 1800.

Attributes:

prover_promptstr: The prover’s own prompt handed to the agent, before any user prompt.

Examples

Construct the prover directly, wiring up a harness and a backend:

>>> from open_atp.backends.docker import DockerBackend
>>> from open_atp.harness import CodexHarness
>>> from open_atp.provers.agent_prover import AgentProver
>>> backend = DockerBackend()
>>> prover = AgentProver(harness=CodexHarness(effort="high"), backend=backend)
>>> prover.harness.model
'gpt-5.5'

Or build the same prover from the standard catalog by name, taking its baked-in defaults (see standard_prover()):

>>> from open_atp import standard_prover
>>> prover = standard_prover("agent:codex", backend=DockerBackend())
>>> prover.name, prover.harness.name
('agent', 'codex')

Complete a task’s sorrys with prove(), here on a bundled example (this runs the agent in Docker and bills it):

>>> import tempfile
>>> from open_atp.examples import EXAMPLE, example_task
>>> task = example_task(EXAMPLE.MUL_REORDER)
>>> result = prover.prove(task, tempfile.mkdtemp())
>>> result.success
True

AgentProver’s harness is a Harness — pick the CLI and its knobs by composing one (e.g. AgentProver(harness=VibeHarness(max_turns=8), backend=...)). The per-harness classes are documented under harness.

NuminaProver¶

class open_atp.provers.numina.NuminaProver(*, backend: ComputeBackend, skills: list[str] | None = None, max_rounds: int = 20, max_consecutive_limits: int = 2, helper_env_keys: tuple[str, ...] = ('GEMINI_API_KEY', 'OPENAI_API_KEY', 'LEAN_LEANDEX_API_KEY', 'ANTHROPIC_API_KEY'), guard_statements: bool = True, on_statement_change: Literal['error', 'warn'] = 'error', timeout_s: int = 1800, env: dict[str, str] | None = None)[source]¶

Bases: AgentProver

Run the Numina coordinator/subagent scaffold as an AgentProver.

A specialization of AgentProver wired to Numina’s vendored scaffold (coordinator prompt + skills + subagent prompts, staged by _stage_numina_assets()); generation and the shared Verifier work exactly as in the base agent prover.

The harness is fixed to an internal NuminaHarness (Claude Code with no plugins, since Numina ships its own scaffold) and is not configurable – Numina is claude-CLI driven, and _stage_numina_assets() mounts the vendored scaffold straight into the known .claude/ locations.

Parameters:

backendComputeBackend: The sandbox the agent runs in. Generation reuses it via a live session and verification runs in that same hot sandbox.
skillslist[str], optional: Extra named/path skills to mount alongside Numina’s vendored scaffold. Default empty – Numina’s coordinator skill is staged from vendor/numina/skills, not this list.
max_roundsint: Maximum number of coordinator rounds before the run stops. Default 20.
max_consecutive_limitsint: Reset (start a fresh session) after this many consecutive LIMIT rounds. Default 2.
helper_env_keystuple[str, …]: Helper-skill credentials forwarded into the sandbox when present in the host env; skills degrade/skip when their key is absent. Default _DEFAULT_HELPER_ENV_KEYS.
guard_statementsbool: Whether to snapshot the target theorems and reject runs that weaken or delete them. Default True.
on_statement_change{“error”, “warn”}: Behavior on a weakened/deleted target theorem: error stops the run and restores the originals; warn restores and continues. Default error (rejects; safe).
timeout_sint: Wall-clock budget for the generation run, in seconds. Default 1800.
envdict[str, str], optional: Extra literal environment variables forwarded into the agent sandbox (Numina pins its harness, so its env knobs live here). Default empty.

Attributes:

prover_promptstr: Numina’s coordinator scaffold + round protocol, before any user prompt.

Examples

Construct the prover directly:

>>> from open_atp.backends.docker import DockerBackend
>>> from open_atp.provers.numina import NuminaProver
>>> backend = DockerBackend()
>>> prover = NuminaProver(backend=backend)
>>> prover.max_rounds
20

Or build the same prover from the standard catalog by name, taking its baked-in defaults (see standard_prover()):

>>> from open_atp import standard_prover
>>> prover = standard_prover("numina", backend=DockerBackend())
>>> prover.name
'numina'

Complete a task’s sorrys with prove(), here on a bundled example (this runs the Numina scaffold in Docker and bills it):

>>> import tempfile
>>> from open_atp.examples import EXAMPLE, example_task
>>> task = example_task(EXAMPLE.INTER_UNION_DISTRIB)
>>> result = prover.prove(task, tempfile.mkdtemp())
>>> result.success
True

AristotleProver¶

class open_atp.provers.aristotle.AristotleProver(*, backend: ComputeBackend, api_key: str | None = None, allow_agent_questions: bool = False, max_connection_retries: int = 5, max_resume_attempts: int = 20, resume_backoff_seconds: float = 5.0, timeout_s: int = 1800)[source]¶

Bases: AutomatedProver

Prove by handing the whole project to Harmonic’s hosted Aristotle agent.

Generation happens over the network (submit the lake project, wait, download the result archive, unpack it over the workdir); the shared Verifier then runs the same local compile/sorry/axiom check. Generation is network-only, so the backend is used solely for that final check – unlike the agentic provers, there is no live session to reuse.

Parameters:

backendComputeBackend: The sandbox used only for the final verify; Aristotle generates over the network, so there is no live session to reuse.
api_keystr, optional: The Harmonic API key. None (default) reads it from the host ARISTOTLE_API_KEY env var.
allow_agent_questionsbool: Whether to let the hosted agent ask clarifying questions. Off by default: this is a headless API path and a prompt for stdin would hang the run.
max_connection_retriesint: Bounds per-call retries (list/refresh/download) when a connection drops. The hosted run lives server-side, so a dropped connection is recoverable: re-fetch rather than reporting the run failed. Default 5.
max_resume_attemptsint: Bounds how many times we re-attach to the event stream when it drops mid-run. Default 20.
resume_backoff_secondsfloat: Initial sleep between retries/resumes, doubling (capped) between tries. Default 5.0.
timeout_sint: Wall-clock budget for the generation run, in seconds. Default 1800.

Attributes:

prover_promptstr: The prover’s own prompt handed to Aristotle, before any user prompt.

Examples

Construct the prover directly (network-only generation, so the backend is just the verify backend):

>>> from open_atp.backends.docker import DockerBackend
>>> from open_atp.provers.aristotle import AristotleProver
>>> backend = DockerBackend()
>>> prover = AristotleProver(backend=backend)
>>> prover.name
'aristotle'

Or build the same prover from the standard catalog by name, taking its baked-in defaults (see standard_prover()):

>>> from open_atp import standard_prover
>>> prover = standard_prover("aristotle", backend=DockerBackend())
>>> prover.name
'aristotle'

Complete a task’s sorrys with prove(), here on a bundled example (this hits the hosted Aristotle API, needing ARISTOTLE_API_KEY, and runs Docker for the verify):

>>> import tempfile
>>> from open_atp.examples import EXAMPLE, example_task
>>> task = example_task(EXAMPLE.ABS_MUL_LT)
>>> result = prover.prove(task, tempfile.mkdtemp())
>>> result.success
True

Standard catalog¶

The standard catalog names each ready-to-run default prover and builds it against a compute backend. standard_prover() maps a catalog name to a constructed AutomatedProver, wiring in the shared image/toolchain from the backend; a caller then drives it directly via prove(), which returns a ProofResult with verification and cost. Agentic provers run generation in a live session over that backend and verify in the same hot sandbox. This is the top-level surface re-exported from open_atp itself.

Names are the agent:<cli> agent provers ("agent:claude", "agent:codex", "agent:opencode", "agent:vibe", "agent:axprover") and the standalone provers ("numina", "aristotle"); standard_provers() lists them all. Each builds its class’s baked-in defaults — to customize any knob, use build_prover() with a full config dict instead.

open_atp.config.standard_prover(name: str, *, backend: ComputeBackend) → AutomatedProver[source]¶

Construct a standard default prover name against backend.

name is a STANDARD_PROVERS key – an agent:<cli> default ("agent:claude", "agent:codex", "agent:opencode", "agent:vibe", "agent:axprover") or a standalone prover ("numina", "aristotle"). The prover is built with its class’s baked-in defaults; to customize any knob (model, effort, skills, …), use build_prover() with a full config dict instead.

The sandbox image (and the toolchain + Mathlib pins projects are checked against) comes from backend, not a parameter here.

Examples

>>> from open_atp.backends.docker import DockerBackend
>>> prover = standard_prover("agent:claude", backend=DockerBackend())
>>> prover.harness.model
'claude-opus-4-8'

open_atp.config.standard_provers() → list[str][source]¶: The names standard_prover() accepts: the STANDARD_PROVERS keys.

open_atp.config.STANDARD_PROVERS: dict[str, dict[str, object]] = {'agent:axprover': {'harness': {'type': 'axprover'}, 'type': 'agent'}, 'agent:claude': {'harness': {'type': 'claude_code'}, 'type': 'agent'}, 'agent:codex': {'harness': {'type': 'codex'}, 'type': 'agent'}, 'agent:opencode': {'harness': {'type': 'opencode'}, 'type': 'agent'}, 'agent:vibe': {'harness': {'type': 'vibe'}, 'type': 'agent'}, 'aristotle': {'type': 'aristotle'}, 'numina': {'type': 'numina'}}¶: The standard catalog: a friendly name -> the canonical prover spec for that ready-to-run default. The agent:<cli> entries are all the shared AgentProver on a different harness; numina and aristotle are the standalone provers. Build one with standard_prover().

provers¶

Base¶

Provers¶

AgentProver¶

NuminaProver¶

AristotleProver¶

Standard catalog¶

`provers`¶