NuminaProver¶
The NuminaProver is a configured variant of the
AgentProver. Structurally, Numina is
“Claude Code + a specific skills/prompts/search toolkit, run in a multi-round loop in
a sandbox”, so rather than re-implement it, open-atp extends AgentProver pinned
to the claude_code harness with Numina’s vendored assets and adds the two genuinely
different behaviours:
a round-continuation loop — re-invoke the agent while it reports it hit a limit rather than completing; and
the statement tracker — guard against the agent deleting or weakening the theorems it was asked to prove.
Numina’s helper skills call out to Leandex / Gemini / GPT, so its config carries
those API keys
(helper_env_keys) to forward
into the sandbox.
The discussion_partner skill (Gemini/GPT) appends a per-call token-usage record
to a workdir ledger (.claude/helper_usage.jsonl); after the run prove() prices
it via the cost table and folds it into
cost_usd, so the reported cost includes discussion-partner spend rather than only
the Claude agent. The split is preserved in metadata (agent_cost_usd,
helper_cost_usd, helper_breakdown). Helper models absent from the price table
are billed at 0 but surfaced in helper_unpriced_models so the gap is visible —
the gpt-5.4-pro / gemini-3.1-pro-preview defaults carry estimated prices
that should be verified against the provider pricing pages.
Warning
The NuminaProver is currently a stub:
prove() raises NotImplementedError.
The config surface below is stable, but the round loop and statement tracker are
still being implemented.
Configuration¶
from open_atp.provers.numina import NuminaProverConfig
config = NuminaProverConfig(
max_rounds=20,
guard_statements=True,
)
The harness is fixed to claude_code and assets to numina. See
NuminaProverConfig in the provers
reference for the full set of fields.