harness¶
The open_atp.harness package is the agent concern composed by the
AgentProver: for one agent CLI it owns the
launch script, credential forwarding, asset staging, and token/cost parsing. The
compute concern (where the command runs, with Lean+Mathlib) lives in the injected
ComputeBackend.
See the per-harness prover pages under Provers for credential setup.
Base¶
- class open_atp.harness.base.Harness(model, effort='medium', assets=None)[source]¶
Base class for an agent CLI harness.
- collect_logs(wd, logs_dir)[source]¶
Move this harness’s rich log files out of
wdintologs_dir.The streamed event JSONL the prover captures from stdout is the agent’s transcript for every CLI harness, so the default does nothing. Harnesses that also drop a richer record inside the workdir (Vibe’s session log, ax-prover’s per-target logs) override this to relocate those files – so
download_wdstays the proof project anddownload_logscarries the full record. Called afterparse()(which may read those files for cost), so moving them is safe.
- property command¶
Bash command the backend runs to launch the agent.
The backend has already
cd’d into the workdir and symlinked.lake; we export$PROMPTfrom the written prompt file (the launch scripts reference it) and run the rendered script.
- configure_wd(wd, prompt)[source]¶
Populate
wdwith the launch script, prompt, MCP config, and skills.
- classmethod from_config(config, *, assets=None)[source]¶
Build a harness from an
AgentProverConfig.The default reads only the shared
model/effortknobs; harnesses with extra config (e.g.VibeHarness’s agent/turn/price) override this.
- parse(lines)[source]¶
Parse the agent’s streamed JSON lines into a
HarnessRunResult.
- class open_atp.harness.base.HarnessRunResult(input_tokens=0, output_tokens=0, stop_reason=None, cost_usd=None, subtype=None, result_text=None)[source]¶
Token totals and cost parsed from an agent’s streamed output.
- class open_atp.harness.base.AuthSpec(env=<factory>, home_dirs=<factory>)[source]¶
Compute-agnostic description of the credentials a harness needs.
- Attributes:
- env:
Host environment-variable names to forward into the sandbox.
- home_dirs:
Host directories to expose under the sandbox’s
$HOME, as(host_dir, dest_basename)pairs (e.g.(~/.codex, ".codex")).
Harnesses¶
Each harness adapts one agent CLI. They are registered in
HARNESSES, keyed by name and selected through
AgentProverConfig’s harness field.
- class open_atp.harness.claude_code.ClaudeCodeHarness(model, effort='medium', assets=None)[source]¶
Bases:
HarnessClaude Code CLI, authenticated by a long-lived
CLAUDE_CODE_OAUTH_TOKEN.- PLUGINS_DIR = '.plugins'¶
Where plugin dirs are staged in the workdir (the launch script’s
--plugin-dirflags reference this, so the two must agree).
- class open_atp.harness.codex.CodexHarness(model, effort='medium', assets=None)[source]¶
Bases:
HarnessCodex CLI, authenticated by a mounted
auth.jsoncredential.
- class open_atp.harness.opencode.OpenCodeHarness(model, effort='medium', provider=None, assets=None)[source]¶
Bases:
HarnessOpenCode CLI, authenticated by a provider API key forwarded from the host.
- class open_atp.harness.vibe.VibeHarness(model, effort='medium', *, agent='lean', max_turns=None, max_price=None, assets=None)[source]¶
Bases:
HarnessMistral Vibe CLI driving its builtin
leanagent (Leanstral) in a sandbox.Vibe’s
leanagent is Leanstral:vibe -p ... --agent leanpins the model toleanstralvia the builtin agent profile (there is no--modelflag). The bareleanprofile is Labs-gated, so thelean-standinstand-in (vendored underassets/vibe/) runs the same Lean scaffold on a non-Labs model until Labs access is enabled. The selected profile is named byagent. Since vibe has no--modelflag, the stand-in profile templates<<MODEL>>and the harness substitutesmodelinto it atconfigure_wd()time – so the model is a knob (default Magistral) just like the other harnesses’--model.Two things differ from the other harnesses:
VIBE_HOME is workdir-local.
vibe_agent.shexportsVIBE_HOME=$PWD/.vibeso vibe’s config (which un-gates the builtinleanagent), the vendored stand-in agent, and the per-session log all live under the workdir and sync back out with it.Cost comes from the session log, not stdout.
--output streamingcarries only conversation messages – no token/cost totals. Those live in vibe’s per-sessionmeta.json;parse()reads it from the synced-back log dir.
- VIBE_HOME_DIR = '.vibe'¶
Workdir-relative VIBE_HOME (matches the export in
vibe_agent.sh).
- collect_logs(wd, logs_dir)[source]¶
Move this harness’s rich log files out of
wdintologs_dir.The streamed event JSONL the prover captures from stdout is the agent’s transcript for every CLI harness, so the default does nothing. Harnesses that also drop a richer record inside the workdir (Vibe’s session log, ax-prover’s per-target logs) override this to relocate those files – so
download_wdstays the proof project anddownload_logscarries the full record. Called afterparse()(which may read those files for cost), so moving them is safe.
- classmethod from_config(config, *, assets=None)[source]¶
Build a harness from an
AgentProverConfig.The default reads only the shared
model/effortknobs; harnesses with extra config (e.g.VibeHarness’s agent/turn/price) override this.
- class open_atp.harness.axprover.AxProverHarness(model, effort='medium', *, max_iterations=None, assets=None)[source]¶
Bases:
Harnessax-prover-base (LangGraph Lean agent), driven by
ax-prover provein-sandbox.ax-prover is a self-contained proving agent (its own proposer->builder->reviewer->memory loop) that edits the target
.leanfile in place. It slots in as a harness rather than a standalone prover becauseAgentProver.prove()already supplies everything around the edges (staging, snapshot/diff, sandbox run, key forwarding) and the sharedVerifier– not ax-prover’s own reviewer – remains the source of truth for compile/sorry/axiom.Two things differ from the CLI harnesses (it mirrors
VibeHarnesshere):Config lives in a workdir YAML, not flags.
configure_wd()writesaxprover.yamlselecting the model/effort/iterations; it layers on top of ax-prover’s bundleddefault.yaml(auto-prepended by the CLI), so it only needs to override the deltas.Cost is not on stdout. ax-prover streams human-readable logs; token usage comes from its
-oJSON (the per-targetax_output.<target>.jsonfiles the launch script writes), which carriesinput_tokens/output_tokensalongside{success, error, summary}as of the pinned fork commit (seeAX_PROVER_REFin__main__.py).parse()sums those across every target and leavescost_usdNoneso the prover converts tokens->USD via the fallback table, exactly likeCodexHarness. (On an ax-prover build without those fields the tokens are simply absent and the run reports zero cost.)
- collect_logs(wd, logs_dir)[source]¶
Move this harness’s rich log files out of
wdintologs_dir.The streamed event JSONL the prover captures from stdout is the agent’s transcript for every CLI harness, so the default does nothing. Harnesses that also drop a richer record inside the workdir (Vibe’s session log, ax-prover’s per-target logs) override this to relocate those files – so
download_wdstays the proof project anddownload_logscarries the full record. Called afterparse()(which may read those files for cost), so moving them is safe.
- open_atp.harness.HARNESSES = {'axprover': <class 'open_atp.harness.axprover.AxProverHarness'>, 'claude_code': <class 'open_atp.harness.claude_code.ClaudeCodeHarness'>, 'codex': <class 'open_atp.harness.codex.CodexHarness'>, 'opencode': <class 'open_atp.harness.opencode.OpenCodeHarness'>, 'vibe': <class 'open_atp.harness.vibe.VibeHarness'>}¶
Harness registry selected by
AgentProverConfig.harness.
Asset bundles¶
An AssetBundle is the selectable set of skills,
default prompt, and extra directories mounted into the agent workdir (selected via
AgentProverConfig.assets).
- class open_atp.harness.bundles.AssetBundle(name, skills=(), plugins=(), prompt_file=None, extra_dirs=(), skills_dir=None)[source]¶
A selectable set of agent assets mounted into the workdir.
- Attributes:
- name:
Bundle identifier matching
AgentProverConfig.assets.- skills:
Individual skill source dirs (each a
<name>/SKILL.mdtree). Each is copied to<dest>/<dir-name>in every harness’s skill location.- plugins:
Claude Code plugin source dirs (each a
.claude-plugin/plugin.jsontree). Mounted and--plugin-dir-loaded by the Claude harness only; ignored by the others.- prompt_file:
Optional default system prompt for the bundle, used when the task carries no explicit
instructions.- extra_dirs:
Additional
(src_dir, dest_relative_to_workdir)trees to copy in (e.g. Numina’s coordinator/subagent prompts under.claude/prompts).- skills_dir:
Legacy whole-directory mount: its contents are copied to the skill location root (so a top-level
SKILL.mdlands at<dest>/SKILL.md). Used by the Numina bundle, which mounts one root-level skill plus helper subdirs (cli/etc.). Preferskillsfor ordinary skills.
- open_atp.harness.bundles.resolve_bundle(name)[source]¶
Resolve an
assetsname to itsAssetBundle.
- open_atp.harness.bundles.BUNDLES = {'default': <function _default_bundle>, 'numina': <function _numina_bundle>}¶
Asset-bundle registry selected by
AgentProverConfig.assets.
- open_atp.harness.bundles.DEFAULT_BUNDLE = AssetBundle(name='default', skills=(PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/open-atp/envs/latest/lib/python3.12/site-packages/open_atp/vendor/leanprover-skills/skills/lean-proof'),), plugins=(PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/open-atp/envs/latest/lib/python3.12/site-packages/open_atp/vendor/lean4-skills/plugins/lean4'),), prompt_file=None, extra_dirs=(), skills_dir=None)¶
Eagerly-resolved default (the common case), so callers have a ready bundle.
Pricing¶
- open_atp.harness.cost.compute_cost_usd(model, input_tokens, output_tokens)[source]¶
Estimate the USD cost of a run from token counts.
Returns
Nonewhenmodelis absent fromCOST_PER_MTOK.
- open_atp.harness.cost.COST_PER_MTOK = {'claude-fable-5': (10.0, 50.0), 'claude-haiku-4-5': (1.0, 5.0), 'claude-opus-4-6': (5.0, 25.0), 'claude-opus-4-7': (5.0, 25.0), 'claude-opus-4-8': (5.0, 25.0), 'claude-sonnet-4-5': (3.0, 15.0), 'claude-sonnet-4-6': (3.0, 15.0), 'deepseek-v4-flash': (0.14, 0.28), 'deepseek-v4-pro': (1.74, 3.48), 'gemini-3.1-pro-preview': (2.0, 12.0), 'gpt-4.1': (2.0, 8.0), 'gpt-4o': (2.5, 10.0), 'gpt-4o-mini': (0.15, 0.6), 'gpt-5.4': (2.5, 15.0), 'gpt-5.4-mini': (0.75, 4.5), 'gpt-5.4-nano': (0.2, 1.25), 'gpt-5.4-pro': (15.0, 120.0), 'gpt-5.5': (5.0, 30.0)}¶
Cost per million tokens, as
(input, output).