harness

The open_atp.harness package is the agent concern composed by the AgentProver: for one agent CLI it owns the launch script, credential forwarding, asset staging, and token/cost parsing. The compute concern (where the command runs, with Lean+Mathlib) lives in the injected ComputeBackend.

See the per-harness prover pages under Provers for credential setup.

Base

class open_atp.harness.base.Harness(model, effort='medium', assets=None)[source]

Base class for an agent CLI harness.

auth_spec()[source]

Credentials to forward into the sandbox for this harness.

collect_logs(wd, logs_dir)[source]

Move this harness’s rich log files out of wd into logs_dir.

The streamed event JSONL the prover captures from stdout is the agent’s transcript for every CLI harness, so the default does nothing. Harnesses that also drop a richer record inside the workdir (Vibe’s session log, ax-prover’s per-target logs) override this to relocate those files – so download_wd stays the proof project and download_logs carries the full record. Called after parse() (which may read those files for cost), so moving them is safe.

property command

Bash command the backend runs to launch the agent.

The backend has already cd’d into the workdir and symlinked .lake; we export $PROMPT from the written prompt file (the launch scripts reference it) and run the rendered script.

configure_wd(wd, prompt)[source]

Populate wd with the launch script, prompt, MCP config, and skills.

classmethod from_config(config, *, assets=None)[source]

Build a harness from an AgentProverConfig.

The default reads only the shared model/effort knobs; harnesses with extra config (e.g. VibeHarness’s agent/turn/price) override this.

parse(lines)[source]

Parse the agent’s streamed JSON lines into a HarnessRunResult.

static_env()[source]

Non-secret env vars to set for this harness (e.g. IS_SANDBOX).

class open_atp.harness.base.HarnessRunResult(input_tokens=0, output_tokens=0, stop_reason=None, cost_usd=None, subtype=None, result_text=None)[source]

Token totals and cost parsed from an agent’s streamed output.

class open_atp.harness.base.AuthSpec(env=<factory>, home_dirs=<factory>)[source]

Compute-agnostic description of the credentials a harness needs.

Attributes:
env:

Host environment-variable names to forward into the sandbox.

home_dirs:

Host directories to expose under the sandbox’s $HOME, as (host_dir, dest_basename) pairs (e.g. (~/.codex, ".codex")).

Harnesses

Each harness adapts one agent CLI. They are registered in HARNESSES, keyed by name and selected through AgentProverConfig’s harness field.

class open_atp.harness.claude_code.ClaudeCodeHarness(model, effort='medium', assets=None)[source]

Bases: Harness

Claude Code CLI, authenticated by a long-lived CLAUDE_CODE_OAUTH_TOKEN.

PLUGINS_DIR = '.plugins'

Where plugin dirs are staged in the workdir (the launch script’s --plugin-dir flags reference this, so the two must agree).

auth_spec()[source]

Credentials to forward into the sandbox for this harness.

static_env()[source]

Non-secret env vars to set for this harness (e.g. IS_SANDBOX).

class open_atp.harness.codex.CodexHarness(model, effort='medium', assets=None)[source]

Bases: Harness

Codex CLI, authenticated by a mounted auth.json credential.

auth_spec()[source]

Credentials to forward into the sandbox for this harness.

class open_atp.harness.opencode.OpenCodeHarness(model, effort='medium', provider=None, assets=None)[source]

Bases: Harness

OpenCode CLI, authenticated by a provider API key forwarded from the host.

auth_spec()[source]

Credentials to forward into the sandbox for this harness.

class open_atp.harness.vibe.VibeHarness(model, effort='medium', *, agent='lean', max_turns=None, max_price=None, assets=None)[source]

Bases: Harness

Mistral Vibe CLI driving its builtin lean agent (Leanstral) in a sandbox.

Vibe’s lean agent is Leanstral: vibe -p ... --agent lean pins the model to leanstral via the builtin agent profile (there is no --model flag). The bare lean profile is Labs-gated, so the lean-standin stand-in (vendored under assets/vibe/) runs the same Lean scaffold on a non-Labs model until Labs access is enabled. The selected profile is named by agent. Since vibe has no --model flag, the stand-in profile templates <<MODEL>> and the harness substitutes model into it at configure_wd() time – so the model is a knob (default Magistral) just like the other harnesses’ --model.

Two things differ from the other harnesses:

  • VIBE_HOME is workdir-local. vibe_agent.sh exports VIBE_HOME=$PWD/.vibe so vibe’s config (which un-gates the builtin lean agent), the vendored stand-in agent, and the per-session log all live under the workdir and sync back out with it.

  • Cost comes from the session log, not stdout. --output streaming carries only conversation messages – no token/cost totals. Those live in vibe’s per-session meta.json; parse() reads it from the synced-back log dir.

VIBE_HOME_DIR = '.vibe'

Workdir-relative VIBE_HOME (matches the export in vibe_agent.sh).

auth_spec()[source]

Credentials to forward into the sandbox for this harness.

collect_logs(wd, logs_dir)[source]

Move this harness’s rich log files out of wd into logs_dir.

The streamed event JSONL the prover captures from stdout is the agent’s transcript for every CLI harness, so the default does nothing. Harnesses that also drop a richer record inside the workdir (Vibe’s session log, ax-prover’s per-target logs) override this to relocate those files – so download_wd stays the proof project and download_logs carries the full record. Called after parse() (which may read those files for cost), so moving them is safe.

classmethod from_config(config, *, assets=None)[source]

Build a harness from an AgentProverConfig.

The default reads only the shared model/effort knobs; harnesses with extra config (e.g. VibeHarness’s agent/turn/price) override this.

parse(lines)[source]

Parse the agent’s streamed JSON lines into a HarnessRunResult.

class open_atp.harness.axprover.AxProverHarness(model, effort='medium', *, max_iterations=None, assets=None)[source]

Bases: Harness

ax-prover-base (LangGraph Lean agent), driven by ax-prover prove in-sandbox.

ax-prover is a self-contained proving agent (its own proposer->builder->reviewer->memory loop) that edits the target .lean file in place. It slots in as a harness rather than a standalone prover because AgentProver.prove() already supplies everything around the edges (staging, snapshot/diff, sandbox run, key forwarding) and the shared Verifier – not ax-prover’s own reviewer – remains the source of truth for compile/sorry/axiom.

Two things differ from the CLI harnesses (it mirrors VibeHarness here):

  • Config lives in a workdir YAML, not flags. configure_wd() writes axprover.yaml selecting the model/effort/iterations; it layers on top of ax-prover’s bundled default.yaml (auto-prepended by the CLI), so it only needs to override the deltas.

  • Cost is not on stdout. ax-prover streams human-readable logs; token usage comes from its -o JSON (the per-target ax_output.<target>.json files the launch script writes), which carries input_tokens/output_tokens alongside {success, error, summary} as of the pinned fork commit (see AX_PROVER_REF in __main__.py). parse() sums those across every target and leaves cost_usd None so the prover converts tokens->USD via the fallback table, exactly like CodexHarness. (On an ax-prover build without those fields the tokens are simply absent and the run reports zero cost.)

auth_spec()[source]

Credentials to forward into the sandbox for this harness.

collect_logs(wd, logs_dir)[source]

Move this harness’s rich log files out of wd into logs_dir.

The streamed event JSONL the prover captures from stdout is the agent’s transcript for every CLI harness, so the default does nothing. Harnesses that also drop a richer record inside the workdir (Vibe’s session log, ax-prover’s per-target logs) override this to relocate those files – so download_wd stays the proof project and download_logs carries the full record. Called after parse() (which may read those files for cost), so moving them is safe.

classmethod from_config(config, *, assets=None)[source]

Build a harness from an AgentProverConfig.

The default reads only the shared model/effort knobs; harnesses with extra config (e.g. VibeHarness’s agent/turn/price) override this.

parse(lines)[source]

Parse the agent’s streamed JSON lines into a HarnessRunResult.

open_atp.harness.HARNESSES = {'axprover': <class 'open_atp.harness.axprover.AxProverHarness'>, 'claude_code': <class 'open_atp.harness.claude_code.ClaudeCodeHarness'>, 'codex': <class 'open_atp.harness.codex.CodexHarness'>, 'opencode': <class 'open_atp.harness.opencode.OpenCodeHarness'>, 'vibe': <class 'open_atp.harness.vibe.VibeHarness'>}

Harness registry selected by AgentProverConfig.harness.

Asset bundles

An AssetBundle is the selectable set of skills, default prompt, and extra directories mounted into the agent workdir (selected via AgentProverConfig.assets).

class open_atp.harness.bundles.AssetBundle(name, skills=(), plugins=(), prompt_file=None, extra_dirs=(), skills_dir=None)[source]

A selectable set of agent assets mounted into the workdir.

Attributes:
name:

Bundle identifier matching AgentProverConfig.assets.

skills:

Individual skill source dirs (each a <name>/SKILL.md tree). Each is copied to <dest>/<dir-name> in every harness’s skill location.

plugins:

Claude Code plugin source dirs (each a .claude-plugin/plugin.json tree). Mounted and --plugin-dir-loaded by the Claude harness only; ignored by the others.

prompt_file:

Optional default system prompt for the bundle, used when the task carries no explicit instructions.

extra_dirs:

Additional (src_dir, dest_relative_to_workdir) trees to copy in (e.g. Numina’s coordinator/subagent prompts under .claude/prompts).

skills_dir:

Legacy whole-directory mount: its contents are copied to the skill location root (so a top-level SKILL.md lands at <dest>/SKILL.md). Used by the Numina bundle, which mounts one root-level skill plus helper subdirs (cli/ etc.). Prefer skills for ordinary skills.

open_atp.harness.bundles.resolve_bundle(name)[source]

Resolve an assets name to its AssetBundle.

open_atp.harness.bundles.BUNDLES = {'default': <function _default_bundle>, 'numina': <function _numina_bundle>}

Asset-bundle registry selected by AgentProverConfig.assets.

open_atp.harness.bundles.DEFAULT_BUNDLE = AssetBundle(name='default', skills=(PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/open-atp/envs/latest/lib/python3.12/site-packages/open_atp/vendor/leanprover-skills/skills/lean-proof'),), plugins=(PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/open-atp/envs/latest/lib/python3.12/site-packages/open_atp/vendor/lean4-skills/plugins/lean4'),), prompt_file=None, extra_dirs=(), skills_dir=None)

Eagerly-resolved default (the common case), so callers have a ready bundle.

Pricing

open_atp.harness.cost.compute_cost_usd(model, input_tokens, output_tokens)[source]

Estimate the USD cost of a run from token counts.

Returns None when model is absent from COST_PER_MTOK.

open_atp.harness.cost.COST_PER_MTOK = {'claude-fable-5': (10.0, 50.0), 'claude-haiku-4-5': (1.0, 5.0), 'claude-opus-4-6': (5.0, 25.0), 'claude-opus-4-7': (5.0, 25.0), 'claude-opus-4-8': (5.0, 25.0), 'claude-sonnet-4-5': (3.0, 15.0), 'claude-sonnet-4-6': (3.0, 15.0), 'deepseek-v4-flash': (0.14, 0.28), 'deepseek-v4-pro': (1.74, 3.48), 'gemini-3.1-pro-preview': (2.0, 12.0), 'gpt-4.1': (2.0, 8.0), 'gpt-4o': (2.5, 10.0), 'gpt-4o-mini': (0.15, 0.6), 'gpt-5.4': (2.5, 15.0), 'gpt-5.4-mini': (0.75, 4.5), 'gpt-5.4-nano': (0.2, 1.25), 'gpt-5.4-pro': (15.0, 120.0), 'gpt-5.5': (5.0, 30.0)}

Cost per million tokens, as (input, output).