Vibe / Leanstral¶
The Vibe prover is the AgentProver on the
VibeHarness — it drives Mistral
Vibe’s builtin lean agent in a sandbox.
Vibe’s lean agent is Leanstral (vibe -p ... --agent lean pins the model to
leanstral; there is no --model flag). The shared
Verifier does the final compile / sorry / axiom
check. See Provers for the lifecycle every agent harness shares.
The Leanstral stand-in¶
The bare lean profile is Labs-gated: the real model labs-leanstral-2603 returns a
403 until a Mistral org admin enables Labs. Until then, the harness runs the same
Lean scaffold on a non-Labs reasoning model any La Plateforme key can reach
(default Magistral) through a vendored lean-standin agent profile. Since Vibe has no
--model flag, the harness templates the configured model into the stand-in profile
(<<MODEL>>) at configure_wd time — so the model is an ordinary knob. Repoint
agent to lean (and the model to labs-leanstral-2603) once Labs is enabled.
Usage¶
from open_atp.backends.docker import DockerBackend, DockerConfig
from open_atp.images import DEFAULT_IMAGE
from open_atp.provers import AgentProver, AgentProverConfig
backend = DockerBackend(DockerConfig(image=DEFAULT_IMAGE))
config = AgentProverConfig(
harness="vibe",
agent="lean-standin", # "lean" once Labs is enabled
model="magistral-medium-latest",
max_turns=None, # passed to `vibe -p --max-turns`
max_price=None, # passed to `vibe -p --max-price`
)
prover = AgentProver(config, verification_backend=backend)
Or by registry spec through get_prover() / the CLI: vibe
(defaults: agent="lean-standin", model="magistral-medium-latest"). Swap the model
with overrides={"model": "devstral-medium-latest"}. The agent, max_turns, and
max_price fields are Vibe-specific.
Harness details¶
configure_wd pins a workdir-local VIBE_HOME (.vibe/) so Vibe’s config, the
vendored stand-in profile, and the per-session log all live under the workdir and sync
back out with it. The written .vibe/config.toml:
sets
installed_agents = ["lean"]to un-gate the builtinleanagent;sets
bypass_tool_permissions = true— the only way to un-gate mutating tools (edit,write_file) invibe -pprogrammatic mode, which has no approval callback (without it the agent’s edits are silently skipped);enables
[session_logging](where cost/token totals are recorded); andwires the lean-lsp MCP server with
tool_timeout_sec = 180(the seconds-valued mirror of the OpenCode 180 s fix, for the cold firstlean_diagnostic_messagescall that loads the full Mathlib import closure).
The stand-in profile is written to .vibe/agents/<agent>.toml, and the bundle’s
skills — the host-agnostic leanprover/skills
— are mounted under .vibe/skills (Vibe’s user skills dir, which loads
regardless of project-folder trust). The launch script
(assets/scripts/vibe_agent.sh) runs:
export VIBE_HOME="$PWD/.vibe"
vibe -p "$PROMPT" --agent <AGENT> --output streaming --workdir "$PWD" <EXTRA>
<EXTRA> appends --max-turns / --max-price when set. The --output streaming
NDJSON message stream (one message per line) goes to stdout.
$PROMPT is the task’s instructions when set, otherwise the shared default agent
prompt baked into the AgentProver:
Default agent prompt
The working directory is a complete Lean 4 lake project. One or more `.lean`
files contain `sorry` (or `admit`) placeholders standing in for proofs that have
not been written yet. Replace every such placeholder with a real proof so the
project compiles cleanly and depends on no axioms beyond Lean's standard set.
Hard rules:
- Do not weaken, rename, restate, or delete any theorem, lemma, `def`,
`structure`, or signature. Only fill in proof bodies (the part after `:=` /
`by` that is currently `sorry`). Changing a statement to make it easier to
prove is failure, not success.
- No new axioms and no `sorry`/`admit`/`native_decide`-on-false escapes. The
finished proof must type-check honestly. The only acceptable axioms are Lean's
standard `propext`, `Classical.choice`, and `Quot.sound`.
- Stay inside this working directory; do not read or write files outside it.
- Do not edit `lakefile.toml`/`lakefile.lean`, `lean-toolchain`, or
`lake-manifest.json` — they pin the toolchain and dependencies and must match
the verification environment.
Workflow:
1. Find the work: search for `sorry` across the `.lean` source files (e.g.
`rg -n '\\bsorry\\b'`). Read each file containing one to understand the
statement, the hypotheses, and the relevant imports.
2. Confirm the lean-lsp MCP server is live before relying on it: call
`mcp__lean-lsp__lean_diagnostic_messages` on a file you have not yet edited.
`success:true, items:[]` means it compiles cleanly; real errors come back as
`items`. `success:false, items:[]` usually means imports aren't built yet —
run `lake build` for the relevant modules first.
3. Write a proof for one `sorry` at a time. Mathlib is available; prefer library
lemmas, `simp`, `omega`, `linarith`, `exact?`/`apply?` suggestions, and
`aesop` over long bespoke arguments.
4. After each edit, re-check that file with
`mcp__lean-lsp__lean_diagnostic_messages` and iterate until it is clean.
5. When a file looks done, verify it has no stubbed proofs with
`mcp__lean-lsp__lean_verify` — the reported axioms must NOT contain `sorryAx`.
6. Repeat until no `.lean` file contains a `sorry` and the whole project builds
(`lake build`).
Tips:
- Use the lean-lsp tools (`mcp__lean-lsp__*`) as your primary feedback loop; they
are far faster than a full `lake build` per change. Use `lake build` to
materialize oleans for imports and as the final whole-project check.
- If a goal looks false or unprovable from the given hypotheses, re-read the
statement: you likely misread a binder or a coercion. Do not "fix" it by
changing the statement — finish the proof as stated.
- Non-trivial proofs routinely take many rounds of compile-error fixing. Keep
iterating against the diagnostics rather than guessing."""
Authentication¶
The harness forwards MISTRAL_API_KEY (a Mistral La Plateforme key) from the host;
the lean agent’s provider reads it from the process env. It must be set or the harness
raises.
Cost tracking¶
The streaming output carries only conversation messages — no token/cost totals. Those
live in Vibe’s per-session meta.json; parse reads session_cost,
session_prompt_tokens, and session_completion_tokens from its stats to populate
cost_usd and the token totals in
HarnessRunResult. collect_logs relocates the
.vibe/logs tree to logs/vibe-session.