Vibe / Leanstral

The Vibe prover is the AgentProver on the VibeHarness — it drives Mistral Vibe’s builtin lean agent in a sandbox. Vibe’s lean agent is Leanstral (vibe -p ... --agent lean pins the model to leanstral; there is no --model flag). The shared Verifier does the final compile / sorry / axiom check. See Provers for the lifecycle every agent harness shares.

The Leanstral stand-in

The bare lean profile is Labs-gated: the real model labs-leanstral-2603 returns a 403 until a Mistral org admin enables Labs. Until then, the harness runs the same Lean scaffold on a non-Labs reasoning model any La Plateforme key can reach (default Magistral) through a vendored lean-standin agent profile. Since Vibe has no --model flag, the harness templates the configured model into the stand-in profile (<<MODEL>>) at stage time — so the model is an ordinary knob. Repoint agent to lean (and the model to labs-leanstral-2603) once Labs is enabled.

Usage

from open_atp.backends.docker import DockerBackend
from open_atp.harness import VibeHarness
from open_atp.images import DEFAULT_IMAGE
from open_atp.provers import AgentProver

backend = DockerBackend(image=DEFAULT_IMAGE)
prover = AgentProver(
    harness=VibeHarness(
        agent="lean-standin",            # "lean" once Labs is enabled
        model="magistral-medium-latest",
        max_turns=None,                  # passed to `vibe -p --max-turns`
        max_price=None,                  # passed to `vibe -p --max-price`
    ),
    backend=backend,
)

VibeHarness selects the Vibe CLI and carries the Vibe-specific agent, max_turns, and max_price knobs (which live only on this harness, not the shared base).

Or by catalog name through standard_prover() / the CLI: agent:vibe (defaults: agent="lean-standin", model="magistral-medium-latest"). standard_prover returns this default prover only; to swap the model, construct VibeHarness(model="devstral-medium-latest") and pass it to AgentProver directly (as above).

Harness details

stage pins a workdir-local VIBE_HOME (.vibe/) so Vibe’s config, the vendored stand-in profile, and the per-session log all live under the workdir and sync back out with it. The written .vibe/config.toml:

  • sets installed_agents = ["lean"] to un-gate the builtin lean agent;

  • sets bypass_tool_permissions = true — the only way to un-gate mutating tools (edit, write_file) in vibe -p programmatic mode, which has no approval callback (without it the agent’s edits are silently skipped);

  • enables [session_logging] (where cost/token totals are recorded); and

  • wires the lean-lsp MCP server with tool_timeout_sec = 180 (the seconds-valued mirror of the OpenCode 180 s fix, for the cold first lean_diagnostic_messages call that loads the full Mathlib import closure).

The stand-in profile is written to .vibe/agents/<agent>.toml, and the prover stages the AgentProver’s skills — the host-agnostic leanprover/skills — under .vibe/skills (Vibe’s user skills dir, which loads regardless of project-folder trust). The launch script (assets/scripts/vibe_agent.sh) runs:

export VIBE_HOME="$PWD/.vibe"
vibe -p "$PROMPT" --agent <AGENT> --output streaming --workdir "$PWD" <EXTRA>

<EXTRA> appends --max-turns / --max-price when set. The --output streaming NDJSON message stream (one message per line) goes to stdout.

$PROMPT is the shared agent prover prompt baked into the AgentProver, with the task’s optional user_prompt appended under an Additional instructions heading when set:

Agent prover prompt
The working directory is a complete Lean 4 lake project. One or more `.lean`
files contain `sorry` (or `admit`) placeholders standing in for proofs that have
not been written yet. Replace every such placeholder with a real proof so the
project compiles cleanly and depends on no axioms beyond Lean's standard set.

Hard rules:
- Do not weaken, rename, restate, or delete any theorem, lemma, `def`,
  `structure`, or signature. Only fill in proof bodies (the part after `:=` /
  `by` that is currently `sorry`). Changing a statement to make it easier to
  prove is failure, not success.
- No new axioms and no `sorry`/`admit`/`native_decide`-on-false escapes. The
  finished proof must type-check honestly. The only acceptable axioms are Lean's
  standard `propext`, `Classical.choice`, and `Quot.sound`.
- Stay inside this working directory; do not read or write files outside it.
- Do not edit `lakefile.toml`/`lakefile.lean`, `lean-toolchain`, or
  `lake-manifest.json` — they pin the toolchain and dependencies and must match
  the verification environment.

Workflow:
1. Find the work: search for `sorry` across the `.lean` source files (e.g.
   `rg -n '\\bsorry\\b'`). Read each file containing one to understand the
   statement, the hypotheses, and the relevant imports.
2. Confirm the lean-lsp MCP server is live before relying on it: call
   `mcp__lean-lsp__lean_diagnostic_messages` on a file you have not yet edited.
   `success:true, items:[]` means it compiles cleanly; real errors come back as
   `items`. `success:false, items:[]` usually means imports aren't built yet —
   run `lake build` for the relevant modules first.
3. Write a proof for one `sorry` at a time. Mathlib is available; prefer library
   lemmas, `simp`, `omega`, `linarith`, `exact?`/`apply?` suggestions, and
   `aesop` over long bespoke arguments.
4. After each edit, re-check that file with
   `mcp__lean-lsp__lean_diagnostic_messages` and iterate until it is clean.
5. When a file looks done, verify it has no stubbed proofs with
   `mcp__lean-lsp__lean_verify` — the reported axioms must NOT contain `sorryAx`.
6. Repeat until no `.lean` file contains a `sorry` and the whole project builds
   (`lake build`).

Tips:
- Use the lean-lsp tools (`mcp__lean-lsp__*`) as your primary feedback loop; they
  are far faster than a full `lake build` per change. Use `lake build` to
  materialize oleans for imports and as the final whole-project check.
- If a goal looks false or unprovable from the given hypotheses, re-read the
  statement: you likely misread a binder or a coercion. Do not "fix" it by
  changing the statement — finish the proof as stated.
- Non-trivial proofs routinely take many rounds of compile-error fixing. Keep
  iterating against the diagnostics rather than guessing."""

Authentication

Pass the Mistral La Plateforme key to the harness explicitly:

VibeHarness(mistral_api_key="msk-...")

or leave mistral_api_key unset (the default) to read it from the host environment. Either way the harness forwards it into the sandbox as MISTRAL_API_KEY, where the lean agent’s provider reads it from the process env. Resolution fails if neither the explicit key nor the host env var is set.

Cost tracking

The streaming output carries only conversation messages — no token/cost totals. Those live in Vibe’s per-session meta.json; parse reads session_cost, session_prompt_tokens, and session_completion_tokens from its stats to populate cost_usd and the token totals in HarnessRunResult. collect_logs relocates the .vibe/logs tree to logs/vibe-session.