Vibe / Leanstral

The Vibe prover is the AgentProver on the VibeHarness — it drives Mistral Vibe’s builtin lean agent in a sandbox. Vibe’s lean agent is Leanstral (vibe -p ... --agent lean pins the model to leanstral; there is no --model flag). The shared Verifier does the final compile / sorry / axiom check. See Provers for the lifecycle every agent harness shares.

The Leanstral stand-in

The bare lean profile is Labs-gated: the real model labs-leanstral-2603 returns a 403 until a Mistral org admin enables Labs. Until then, the harness runs the same Lean scaffold on a non-Labs reasoning model any La Plateforme key can reach (default Magistral) through a vendored lean-standin agent profile. Since Vibe has no --model flag, the harness templates the configured model into the stand-in profile (<<MODEL>>) at configure_wd time — so the model is an ordinary knob. Repoint agent to lean (and the model to labs-leanstral-2603) once Labs is enabled.

Usage

from open_atp.backends.docker import DockerBackend, DockerConfig
from open_atp.images import DEFAULT_IMAGE
from open_atp.provers import AgentProver, AgentProverConfig

backend = DockerBackend(DockerConfig(image=DEFAULT_IMAGE))
config = AgentProverConfig(
    harness="vibe",
    agent="lean-standin",            # "lean" once Labs is enabled
    model="magistral-medium-latest",
    max_turns=None,                  # passed to `vibe -p --max-turns`
    max_price=None,                  # passed to `vibe -p --max-price`
)
prover = AgentProver(config, verification_backend=backend)

Or by registry spec through get_prover() / the CLI: vibe (defaults: agent="lean-standin", model="magistral-medium-latest"). Swap the model with overrides={"model": "devstral-medium-latest"}. The agent, max_turns, and max_price fields are Vibe-specific.

Harness details

configure_wd pins a workdir-local VIBE_HOME (.vibe/) so Vibe’s config, the vendored stand-in profile, and the per-session log all live under the workdir and sync back out with it. The written .vibe/config.toml:

  • sets installed_agents = ["lean"] to un-gate the builtin lean agent;

  • sets bypass_tool_permissions = true — the only way to un-gate mutating tools (edit, write_file) in vibe -p programmatic mode, which has no approval callback (without it the agent’s edits are silently skipped);

  • enables [session_logging] (where cost/token totals are recorded); and

  • wires the lean-lsp MCP server with tool_timeout_sec = 180 (the seconds-valued mirror of the OpenCode 180 s fix, for the cold first lean_diagnostic_messages call that loads the full Mathlib import closure).

The stand-in profile is written to .vibe/agents/<agent>.toml, and the bundle’s skills — the host-agnostic leanprover/skills — are mounted under .vibe/skills (Vibe’s user skills dir, which loads regardless of project-folder trust). The launch script (assets/scripts/vibe_agent.sh) runs:

export VIBE_HOME="$PWD/.vibe"
vibe -p "$PROMPT" --agent <AGENT> --output streaming --workdir "$PWD" <EXTRA>

<EXTRA> appends --max-turns / --max-price when set. The --output streaming NDJSON message stream (one message per line) goes to stdout.

$PROMPT is the task’s instructions when set, otherwise the shared default agent prompt baked into the AgentProver:

Default agent prompt
The working directory is a complete Lean 4 lake project. One or more `.lean`
files contain `sorry` (or `admit`) placeholders standing in for proofs that have
not been written yet. Replace every such placeholder with a real proof so the
project compiles cleanly and depends on no axioms beyond Lean's standard set.

Hard rules:
- Do not weaken, rename, restate, or delete any theorem, lemma, `def`,
  `structure`, or signature. Only fill in proof bodies (the part after `:=` /
  `by` that is currently `sorry`). Changing a statement to make it easier to
  prove is failure, not success.
- No new axioms and no `sorry`/`admit`/`native_decide`-on-false escapes. The
  finished proof must type-check honestly. The only acceptable axioms are Lean's
  standard `propext`, `Classical.choice`, and `Quot.sound`.
- Stay inside this working directory; do not read or write files outside it.
- Do not edit `lakefile.toml`/`lakefile.lean`, `lean-toolchain`, or
  `lake-manifest.json` — they pin the toolchain and dependencies and must match
  the verification environment.

Workflow:
1. Find the work: search for `sorry` across the `.lean` source files (e.g.
   `rg -n '\\bsorry\\b'`). Read each file containing one to understand the
   statement, the hypotheses, and the relevant imports.
2. Confirm the lean-lsp MCP server is live before relying on it: call
   `mcp__lean-lsp__lean_diagnostic_messages` on a file you have not yet edited.
   `success:true, items:[]` means it compiles cleanly; real errors come back as
   `items`. `success:false, items:[]` usually means imports aren't built yet —
   run `lake build` for the relevant modules first.
3. Write a proof for one `sorry` at a time. Mathlib is available; prefer library
   lemmas, `simp`, `omega`, `linarith`, `exact?`/`apply?` suggestions, and
   `aesop` over long bespoke arguments.
4. After each edit, re-check that file with
   `mcp__lean-lsp__lean_diagnostic_messages` and iterate until it is clean.
5. When a file looks done, verify it has no stubbed proofs with
   `mcp__lean-lsp__lean_verify` — the reported axioms must NOT contain `sorryAx`.
6. Repeat until no `.lean` file contains a `sorry` and the whole project builds
   (`lake build`).

Tips:
- Use the lean-lsp tools (`mcp__lean-lsp__*`) as your primary feedback loop; they
  are far faster than a full `lake build` per change. Use `lake build` to
  materialize oleans for imports and as the final whole-project check.
- If a goal looks false or unprovable from the given hypotheses, re-read the
  statement: you likely misread a binder or a coercion. Do not "fix" it by
  changing the statement — finish the proof as stated.
- Non-trivial proofs routinely take many rounds of compile-error fixing. Keep
  iterating against the diagnostics rather than guessing."""

Authentication

The harness forwards MISTRAL_API_KEY (a Mistral La Plateforme key) from the host; the lean agent’s provider reads it from the process env. It must be set or the harness raises.

Cost tracking

The streaming output carries only conversation messages — no token/cost totals. Those live in Vibe’s per-session meta.json; parse reads session_cost, session_prompt_tokens, and session_completion_tokens from its stats to populate cost_usd and the token totals in HarnessRunResult. collect_logs relocates the .vibe/logs tree to logs/vibe-session.