provers¶
The concrete provers and the registry/factory over them. Each prover subclasses
AutomatedProver and funnels its output through the
shared Verifier. The agentic provers compose an
agent harness (the agent concern) with a
ComputeBackend (the compute concern).
Base¶
The base prover abstraction. An AutomatedProver is a
candidate generator; the base class owns the shared lifecycle (the public prove:
generate, then verify in the sandbox) so subclasses only implement _generate.
- class open_atp.provers.base.AutomatedProver(*, backend: ComputeBackend, timeout_s: int = 1800)[source]¶
Generate candidate proofs, then verify them in a shared sandbox.
The sandbox image (its tag plus the Lean toolchain + Mathlib pins the shared verifier checks every project against) comes from
backend– a prover inherits whatever image its backend runs.- Parameters:
- backend
ComputeBackend The one backend for this prover. Agentic provers reuse it (via a live session) for generation, then verify in that hot sandbox; Aristotle uses it only for the final check.
- timeout_s
int Wall-clock budget for the generation run, in seconds. Default
1800.
- backend
- prove(task: ProofTask, output_dir: Path | str) ProofResult[source]¶
Full lifecycle: reject-on-mismatch, generate, verify, write the result.
- Parameters:
- task
ProofTask The unit of work: the lake project to complete, the optional
targetsto focus on, and anyuser_promptguidance.- output_dir
pathlib.Pathorstr Caller-chosen output directory, populated as
output_dir/{wd,logs}/:wdis the completed lake project (the proof output) andlogsis the run record (the agentstdout.txt/stderr.txt,result.json, and any harness-specific rich logs).
- task
- Returns:
ProofResultThe verification verdict and run metadata, pointing at the populated
wdandlogs_dir.
- class open_atp.provers.base.ProofResult(prover: str, verification: VerificationReport | None, output_dir: Path, completed_files: dict[str, str]=<factory>, cost_usd: float | None = None, duration_s: float | None = None, metadata: dict[str, object]=<factory>, error: str | None = None)[source]¶
What a prover returns from
AutomatedProver.prove().The prover writes its artifacts into the caller-chosen
output_dir, laid out asoutput_dir/{wd,logs}/:wdis the completed lake project (the proof output) andlogsis the run record (the streamed agentstdout.txt,stderr.txt,result.json, and any harness-specific rich logs). This object just records where those live, plus the verification verdict and run metadata.- Attributes:
- prover
str Name of the prover that produced this result.
- verification
VerificationReportorNone The shared verification of the completed project, or
Nonewhen the run failed before a candidate could be verified (seeerror).- output_dir
pathlib.Path The run’s output directory. Holds the
wd(proof project) andlogs_dir(run record) subdirectories the prover populated.- completed_files
dict[str,str] The completed
.leansources, keyed by file path relative to the project root.- cost_usd
float, optional Estimated USD cost of the run.
Nonewhen the prover does not report cost.- duration_s
float, optional Wall-clock duration of the run, in seconds.
- metadata
dict[str,object] Harness-specific run metadata (token counts, run summaries, …).
- error
str, optional Set when the prover raised before producing a result (Docker down, API error, toolchain mismatch).
verificationisNoneandsuccessisFalse.wdpathlib.PathThe completed proof project (
output_dir/wd).logs_dirpathlib.PathThe run’s logs directory (
output_dir/logs).- successbool
True iff
verificationexists and isverified.
- prover
Provers¶
The concrete candidate generators.
AgentProver¶
- class open_atp.provers.agent_prover.AgentProver(*, backend: ComputeBackend, harness: Harness | None = None, skills: list[str] | None = None, timeout_s: int = 1800)[source]¶
Bases:
AutomatedProverGenerate proofs by driving an agent CLI harness in a compute backend.
Composes an agent harness (the agent concern) with a
ComputeBackend(the compute concern): the harness edits the staged.leanfiles in place, then the sharedVerifierdoes the final compile/sorry/axiom check.codex,opencode,axprover, andvibeare this prover on a different harness.- Parameters:
- backend
ComputeBackend The sandbox the agent runs in. Generation reuses it via a live session and verification runs in that same hot sandbox.
- harness
Harness, optional The harness to drive and its knobs:
ClaudeCodeHarness(default),CodexHarness,OpenCodeHarness,VibeHarness, orAxProverHarness. Carriesmodel/effortplus any harness-specific knobs. Plugins are Claude-only and live onClaudeCodeHarness.- skills
list[str], optional Skills to mount into the agent workdir, each a name (resolved from the vendored
leanprover/skillscatalog) or a full path to aSKILL.mdtree. Default["lean-proof"]; an empty list mounts none. Staged into every skill-supporting harness’s location; ignored by ax-prover.- timeout_s
int Wall-clock budget for the generation run, in seconds. Default
1800.
- backend
- Attributes:
prover_promptstrThe prover’s own prompt handed to the agent, before any user prompt.
Examples
Construct the prover directly, wiring up a harness and a backend:
>>> from open_atp.backends.docker import DockerBackend >>> from open_atp.harness import CodexHarness >>> from open_atp.provers.agent_prover import AgentProver >>> backend = DockerBackend() >>> prover = AgentProver(harness=CodexHarness(effort="high"), backend=backend) >>> prover.harness.model 'gpt-5.5'
Or build the same prover from the standard catalog by name, taking its baked-in defaults (see
standard_prover()):>>> from open_atp import standard_prover >>> prover = standard_prover("agent:codex", backend=DockerBackend()) >>> prover.name, prover.harness.name ('agent', 'codex')
Complete a task’s
sorrys withprove(), here on a bundled example (this runs the agent in Docker and bills it):>>> import tempfile >>> from open_atp.examples import EXAMPLE, example_task >>> task = example_task(EXAMPLE.MUL_REORDER) >>> result = prover.prove(task, tempfile.mkdtemp()) >>> result.success True
AgentProver’s harness is a Harness — pick the
CLI and its knobs by composing one (e.g.
AgentProver(harness=VibeHarness(max_turns=8), backend=...)). The per-harness classes
are documented under harness.
NuminaProver¶
- class open_atp.provers.numina.NuminaProver(*, backend: ComputeBackend, skills: list[str] | None = None, max_rounds: int = 20, max_consecutive_limits: int = 2, helper_env_keys: tuple[str, ...] = ('GEMINI_API_KEY', 'OPENAI_API_KEY', 'LEAN_LEANDEX_API_KEY', 'ANTHROPIC_API_KEY'), guard_statements: bool = True, on_statement_change: Literal['error', 'warn'] = 'error', timeout_s: int = 1800, env: dict[str, str] | None = None)[source]¶
Bases:
AgentProverRun the Numina coordinator/subagent scaffold as an
AgentProver.A specialization of
AgentProverwired to Numina’s vendored scaffold (coordinator prompt + skills + subagent prompts, staged by_stage_numina_assets()); generation and the sharedVerifierwork exactly as in the base agent prover.The harness is fixed to an internal
NuminaHarness(Claude Code with no plugins, since Numina ships its own scaffold) and is not configurable – Numina is claude-CLI driven, and_stage_numina_assets()mounts the vendored scaffold straight into the known.claude/locations.- Parameters:
- backend
ComputeBackend The sandbox the agent runs in. Generation reuses it via a live session and verification runs in that same hot sandbox.
- skills
list[str], optional Extra named/path skills to mount alongside Numina’s vendored scaffold. Default empty – Numina’s coordinator skill is staged from
vendor/numina/skills, not this list.- max_rounds
int Maximum number of coordinator rounds before the run stops. Default
20.- max_consecutive_limits
int Reset (start a fresh session) after this many consecutive LIMIT rounds. Default
2.- helper_env_keys
tuple[str, …] Helper-skill credentials forwarded into the sandbox when present in the host env; skills degrade/skip when their key is absent. Default
_DEFAULT_HELPER_ENV_KEYS.- guard_statementsbool
Whether to snapshot the target theorems and reject runs that weaken or delete them. Default
True.- on_statement_change{“error”, “warn”}
Behavior on a weakened/deleted target theorem:
errorstops the run and restores the originals;warnrestores and continues. Defaulterror(rejects; safe).- timeout_s
int Wall-clock budget for the generation run, in seconds. Default
1800.- env
dict[str,str], optional Extra literal environment variables forwarded into the agent sandbox (Numina pins its harness, so its env knobs live here). Default empty.
- backend
- Attributes:
prover_promptstrNumina’s coordinator scaffold + round protocol, before any user prompt.
Examples
Construct the prover directly:
>>> from open_atp.backends.docker import DockerBackend >>> from open_atp.provers.numina import NuminaProver >>> backend = DockerBackend() >>> prover = NuminaProver(backend=backend) >>> prover.max_rounds 20
Or build the same prover from the standard catalog by name, taking its baked-in defaults (see
standard_prover()):>>> from open_atp import standard_prover >>> prover = standard_prover("numina", backend=DockerBackend()) >>> prover.name 'numina'
Complete a task’s
sorrys withprove(), here on a bundled example (this runs the Numina scaffold in Docker and bills it):>>> import tempfile >>> from open_atp.examples import EXAMPLE, example_task >>> task = example_task(EXAMPLE.INTER_UNION_DISTRIB) >>> result = prover.prove(task, tempfile.mkdtemp()) >>> result.success True
AristotleProver¶
- class open_atp.provers.aristotle.AristotleProver(*, backend: ComputeBackend, api_key: str | None = None, allow_agent_questions: bool = False, max_connection_retries: int = 5, max_resume_attempts: int = 20, resume_backoff_seconds: float = 5.0, timeout_s: int = 1800)[source]¶
Bases:
AutomatedProverProve by handing the whole project to Harmonic’s hosted Aristotle agent.
Generation happens over the network (submit the lake project, wait, download the result archive, unpack it over the workdir); the shared
Verifierthen runs the same local compile/sorry/axiom check. Generation is network-only, so the backend is used solely for that final check – unlike the agentic provers, there is no live session to reuse.- Parameters:
- backend
ComputeBackend The sandbox used only for the final verify; Aristotle generates over the network, so there is no live session to reuse.
- api_key
str, optional The Harmonic API key.
None(default) reads it from the hostARISTOTLE_API_KEYenv var.- allow_agent_questionsbool
Whether to let the hosted agent ask clarifying questions. Off by default: this is a headless API path and a prompt for stdin would hang the run.
- max_connection_retries
int Bounds per-call retries (list/refresh/download) when a connection drops. The hosted run lives server-side, so a dropped connection is recoverable: re-fetch rather than reporting the run failed. Default
5.- max_resume_attempts
int Bounds how many times we re-attach to the event stream when it drops mid-run. Default
20.- resume_backoff_seconds
float Initial sleep between retries/resumes, doubling (capped) between tries. Default
5.0.- timeout_s
int Wall-clock budget for the generation run, in seconds. Default
1800.
- backend
- Attributes:
prover_promptstrThe prover’s own prompt handed to Aristotle, before any user prompt.
Examples
Construct the prover directly (network-only generation, so the backend is just the verify backend):
>>> from open_atp.backends.docker import DockerBackend >>> from open_atp.provers.aristotle import AristotleProver >>> backend = DockerBackend() >>> prover = AristotleProver(backend=backend) >>> prover.name 'aristotle'
Or build the same prover from the standard catalog by name, taking its baked-in defaults (see
standard_prover()):>>> from open_atp import standard_prover >>> prover = standard_prover("aristotle", backend=DockerBackend()) >>> prover.name 'aristotle'
Complete a task’s
sorrys withprove(), here on a bundled example (this hits the hosted Aristotle API, needingARISTOTLE_API_KEY, and runs Docker for the verify):>>> import tempfile >>> from open_atp.examples import EXAMPLE, example_task >>> task = example_task(EXAMPLE.ABS_MUL_LT) >>> result = prover.prove(task, tempfile.mkdtemp()) >>> result.success True
Standard catalog¶
The standard catalog names each ready-to-run default prover and builds it against a
compute backend. standard_prover() maps a catalog name to a
constructed AutomatedProver, wiring in the shared
image/toolchain from the backend; a caller then drives it directly via
prove(), which returns a
ProofResult with verification and cost. Agentic provers
run generation in a live session over that backend and verify in the same hot sandbox.
This is the top-level surface re-exported from open_atp itself.
Names are the agent:<cli> agent provers ("agent:claude", "agent:codex",
"agent:opencode", "agent:vibe", "agent:axprover") and the standalone provers
("numina", "aristotle"); standard_provers() lists them all.
Each builds its class’s baked-in defaults — to customize any knob, use
build_prover() with a full config dict instead.
- open_atp.config.standard_prover(name: str, *, backend: ComputeBackend) AutomatedProver[source]¶
Construct a standard default prover
nameagainstbackend.nameis aSTANDARD_PROVERSkey – anagent:<cli>default ("agent:claude","agent:codex","agent:opencode","agent:vibe","agent:axprover") or a standalone prover ("numina","aristotle"). The prover is built with its class’s baked-in defaults; to customize any knob (model, effort, skills, …), usebuild_prover()with a full config dict instead.The sandbox image (and the toolchain + Mathlib pins projects are checked against) comes from
backend, not a parameter here.Examples
>>> from open_atp.backends.docker import DockerBackend >>> prover = standard_prover("agent:claude", backend=DockerBackend()) >>> prover.harness.model 'claude-opus-4-8'
- open_atp.config.standard_provers() list[str][source]¶
The names
standard_prover()accepts: theSTANDARD_PROVERSkeys.
- open_atp.config.STANDARD_PROVERS: dict[str, dict[str, object]] = {'agent:axprover': {'harness': {'type': 'axprover'}, 'type': 'agent'}, 'agent:claude': {'harness': {'type': 'claude_code'}, 'type': 'agent'}, 'agent:codex': {'harness': {'type': 'codex'}, 'type': 'agent'}, 'agent:opencode': {'harness': {'type': 'opencode'}, 'type': 'agent'}, 'agent:vibe': {'harness': {'type': 'vibe'}, 'type': 'agent'}, 'aristotle': {'type': 'aristotle'}, 'numina': {'type': 'numina'}}¶
The standard catalog: a friendly name -> the canonical
proverspec for that ready-to-run default. Theagent:<cli>entries are all the sharedAgentProveron a different harness;numinaandaristotleare the standalone provers. Build one withstandard_prover().