> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt
> Use this file to discover all available pages before exploring further.

# Building Multi-Turn Environments

> Create interactive environments with turn-by-turn model responses

Multi-turn environments enable back-and-forth interaction between the model and the environment. They're perfect for games, simulations, debugging tasks, and any scenario where the model needs multiple attempts or receives feedback after each action.

## Overview

`MultiTurnEnv` implements the core rollout loop used by all Verifiers environments (even `SingleTurnEnv` is just a `MultiTurnEnv` with `max_turns=1`). Each rollout follows this pattern:

1. **Initialize state** — `setup_state()` prepares per-rollout resources
2. **Loop until done:**
   * Get prompt messages (initial prompt or previous conversation + environment response)
   * Get model response
   * Check stop conditions — exit if any `@vf.stop` method returns `True`
3. **Render completion** — assemble final conversation into `state["completion"]`
4. **Cleanup** — run all `@vf.cleanup` methods

## The Rollout Loop

Here's the core structure of a multi-turn rollout:

```python theme={null}
class MultiTurnEnv(vf.Environment):
    async def rollout(self, input, client, model, sampling_args):
        state = await self.init_state(input, client, model, sampling_args)
        
        try:
            state = await self.setup_state(state)  # 1. Initialize
            
            while not await self.is_completed(state):  # 2. Loop
                prompt_messages = await self.get_prompt_messages(state)
                response = await self.get_model_response(state, prompt_messages)
                await self.add_model_response(state, prompt_messages, response)
            
            await self.render_completion(state)  # 3. Finalize
            return state
        finally:
            await self._cleanup(state)  # 4. Cleanup
```

To build a custom multi-turn environment, you override specific methods:

* `env_response()` — **Required**. Define how the environment responds after each model turn
* `setup_state()` — Optional. Initialize per-rollout resources
* `@vf.stop` methods — Optional. Define custom stop conditions
* `@vf.cleanup` methods — Optional. Cleanup resources after each rollout

## Building a Custom Environment

Let's build a simple number guessing game:

<Steps>
  ### Define the Environment Class

  ```python theme={null}
  import verifiers as vf
  import random

  class NumberGuessingEnv(vf.MultiTurnEnv):
      def __init__(self, max_turns: int = 10, **kwargs):
          super().__init__(max_turns=max_turns, **kwargs)
  ```

  ### Initialize Per-Rollout State

  ```python theme={null}
  class NumberGuessingEnv(vf.MultiTurnEnv):
      async def setup_state(self, state: vf.State) -> vf.State:
          # Pick a random number for this rollout
          state["target_number"] = random.randint(1, 100)
          state["attempts"] = 0
          return await super().setup_state(state)
  ```

  ### Implement Environment Response

  The `env_response()` method defines what happens after each model turn:

  ```python theme={null}
  class NumberGuessingEnv(vf.MultiTurnEnv):
      async def env_response(self, messages: vf.Messages, state: vf.State) -> vf.Messages:
          """Process the guess and return feedback."""
          # Extract the guess from the model's response
          last_message = messages[-1]["content"]
          
          try:
              guess = int(last_message.strip())
          except ValueError:
              return [{"role": "user", "content": "Please provide a number."}]
          
          state["attempts"] += 1
          target = state["target_number"]
          
          if guess == target:
              state["won"] = True
              return [{"role": "user", "content": f"Correct! The number was {target}."}]
          elif guess < target:
              return [{"role": "user", "content": "Too low. Try again."}]
          else:
              return [{"role": "user", "content": "Too high. Try again."}]
  ```

  ### Add Stop Conditions

  Define when the rollout should end:

  ```python theme={null}
  class NumberGuessingEnv(vf.MultiTurnEnv):
      @vf.stop
      async def game_won(self, state: vf.State) -> bool:
          return state.get("won", False)
  ```

  Built-in stop conditions:

  * `has_error` — stops if `state["error"]` is set
  * `max_turns_reached` — stops after `max_turns` iterations
  * `prompt_too_long` — stops if prompt exceeds model context

  ### Create Dataset and Rubric

  ```python theme={null}
  from datasets import Dataset

  def load_environment():
      # Each row is one game instance
      dataset = Dataset.from_list([
          {"prompt": [{"role": "user", "content": "Guess a number between 1 and 100."}]}
          for _ in range(100)
      ])
      
      # Reward function
      async def won_game(state) -> float:
          return 1.0 if state.get("won", False) else 0.0
      
      async def efficiency_bonus(state) -> float:
          if not state.get("won", False):
              return 0.0
          attempts = state.get("attempts", 10)
          return max(0.0, 1.0 - (attempts / 10))  # Bonus for fewer attempts
      
      rubric = vf.Rubric(
          funcs=[won_game, efficiency_bonus],
          weights=[1.0, 0.5]
      )
      
      return NumberGuessingEnv(dataset=dataset, rubric=rubric, max_turns=10)
  ```
</Steps>

## Real Example: Wordle

Let's examine the `wordle` environment from the repository:

```python environments/wordle/wordle.py theme={null}
import re
import verifiers as vf
from verifiers.envs.integrations.textarena_env import TextArenaEnv

DEFAULT_SYSTEM_PROMPT = """You are a competitive game player. \
Make sure you read the game instructions carefully, and always follow the required format.

In each turn, think step-by-step, then give your guess inside <guess>...</guess> tags."""

def wordle_feedback_fn(observation: str) -> str:
    """Extract just the latest feedback from the game state."""
    latest_observation = observation.split("[GAME]")[-1].strip()
    if "Feedback:" in latest_observation:
        return latest_observation.split("Feedback:")[-1]
    else:
        return latest_observation

def correct_answer(parser, completion, answer, **kwargs) -> float:
    """Whether the guess is *exactly* correct."""
    guess = parser.parse_answer(completion)
    return 1.0 if guess == "[" + answer + "]" else 0.0

def length_bonus(parser, completion, answer, **kwargs) -> float:
    """Bonus for shorter correct solutions."""
    assistant_messages = parser.get_assistant_messages(completion)
    guesses = [x for x in assistant_messages if re.search(r"<guess>.*</guess>", x["content"])]
    is_correct = correct_answer(parser, completion, answer, **kwargs)
    return is_correct / (len(guesses) or 1)

def load_environment(
    num_train_examples: int = 2000,
    num_eval_examples: int = 20,
    system_prompt: str = DEFAULT_SYSTEM_PROMPT,
    seed: int = 0,
    **kwargs,
):
    parser = vf.XMLParser(fields=["guess"], answer_field="guess")
    
    rubric = vf.Rubric(parser=parser)
    rubric.add_reward_func(correct_answer)
    rubric.add_reward_func(length_bonus)
    
    return TextArenaEnv(
        game="Wordle-v0",
        num_train_examples=num_train_examples,
        num_eval_examples=num_eval_examples,
        feedback_fn=wordle_feedback_fn,
        seed=seed,
        system_prompt=system_prompt,
        parser=parser,
        rubric=rubric,
        **kwargs,
    )
```

Key features:

* Wraps a TextArena game environment
* Uses `XMLParser` to extract guesses from structured output
* Custom `feedback_fn` cleans up the game state for the model
* Multiple reward functions: correctness + efficiency bonus

## Advanced Patterns

### Custom Stop Conditions

Control when rollouts end with `@vf.stop` decorators:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    @vf.stop
    async def game_won(self, state: vf.State) -> bool:
        return state.get("won", False)
    
    @vf.stop
    async def game_lost(self, state: vf.State) -> bool:
        return state.get("lives", 3) <= 0
    
    @vf.stop(priority=10)  # Check this first
    async def answer_submitted(self, state: vf.State) -> bool:
        completion = state.get("completion", [])
        if not completion:
            return False
        return "FINAL ANSWER:" in completion[-1].get("content", "")
```

Priority ordering (higher runs first) lets you check cheap conditions before expensive ones.

### Early Termination from env\_response

Signal completion directly from the environment response:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(self, messages: vf.Messages, state: vf.State) -> vf.Messages:
        if check_game_over(state):
            final_message = [
                {"role": "user", "content": f"Game over! Final score: {state['score']}"}
            ]
            state["final_env_response"] = final_message
            return final_message
        
        # Normal game continues...
        return process_turn(messages, state)
```

Setting `state["final_env_response"]` bypasses the model response loop and terminates immediately.

### Cleanup and Resource Management

Use decorators for proper resource cleanup:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    @vf.cleanup
    async def save_game_log(self, state: vf.State):
        """Called after each rollout completes."""
        await log_game_result(state["game_id"], state["score"])
    
    @vf.teardown
    async def close_connections(self):
        """Called once when environment shuts down."""
        await self.db_connection.close()
```

<Warning>
  **Important:** Cleanup methods should be idempotent (safe to call multiple times) and handle errors gracefully. This ensures correct behavior when rollouts are cancelled or interrupted.
</Warning>

### Custom Message Assembly

Override `get_prompt_messages()` for non-linear conversations:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def get_prompt_messages(self, state: vf.State) -> vf.Messages:
        if len(state["trajectory"]) == 0:
            # First turn: return initial prompt
            return state["prompt"]
        
        # Subsequent turns: reconstruct conversation with game state
        messages = []
        messages.append({"role": "system", "content": self.system_prompt})
        
        for turn in state["trajectory"]:
            messages.extend(turn["completion"])
        
        # Add environment response
        env_response = await self.env_response(messages, state)
        messages.extend(env_response)
        
        return messages
```

### Trajectory Tracking

Add metadata to each turn:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def add_trajectory_step(self, state: vf.State, trajectory_step):
        """Add custom metadata to each turn."""
        trajectory_step["extras"]["board_state"] = state["board"].copy()
        trajectory_step["extras"]["valid_moves"] = state["valid_moves"]
        await super().add_trajectory_step(state, trajectory_step)
```

## Error Handling

Verifiers provides a hierarchy of error types under `vf.Error`:

```python theme={null}
vf.ModelError           # Model interaction errors
vf.OverlongPromptError  # Prompt exceeds context length
vf.ToolError            # Tool-related errors
vf.InfraError           # Infrastructure errors (e.g., sandbox)
```

When a `vf.Error` is raised during a rollout:

1. It's caught automatically
2. Stored in `state["error"]`
3. The built-in `has_error` stop condition triggers
4. The rollout terminates gracefully

Example:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(self, messages: vf.Messages, state: vf.State) -> vf.Messages:
        try:
            result = await self.external_api.call(messages)
            return [{"role": "user", "content": result}]
        except ExternalAPIError as e:
            raise vf.InfraError(f"API call failed: {e}") from e
```

## Monitor Rubrics

Track environment-specific metrics automatically:

```python theme={null}
class MyMonitorRubric(vf.Rubric):
    def __init__(self):
        super().__init__()
        self.add_metric(self.average_score)
        self.add_metric(self.total_moves)
    
    async def average_score(self, state: vf.State) -> float:
        turns = len(state["trajectory"])
        total_score = state.get("score", 0)
        return total_score / max(turns, 1)
    
    async def total_moves(self, state: vf.State) -> float:
        return float(len(state["trajectory"]))

class MyGameEnv(vf.MultiTurnEnv):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.add_rubric(MyMonitorRubric())
```

`MultiTurnEnv` automatically tracks `num_turns` for all multi-turn environments.

## Testing Your Environment

<Steps>
  ### Install and run a quick test

  ```bash theme={null}
  prime env install my-game-env
  prime eval run my-game-env -m gpt-4.1-mini -n 5 -r 3
  ```

  ### Check metrics

  Expected output:

  ```
  Loading environment: my-game-env
  Running 5 examples × 3 rollouts = 15 total rollouts
  Progress: ████████████████████ 15/15 (100%)

  Results:
    Reward: 0.67 ± 0.21
    won_game: 0.67 ± 0.47
    efficiency_bonus: 0.23 ± 0.18
    num_turns: 6.2 ± 2.1
  ```

  ### Debug with verbose mode

  ```bash theme={null}
  prime eval run my-game-env -m gpt-4.1-mini -n 2 -v
  ```

  Shows detailed logs including:

  * Model requests and responses
  * Environment responses
  * State updates
  * Stop condition checks

  ### Save detailed results

  ```bash theme={null}
  prime eval run my-game-env -m gpt-4.1-mini -n 10 -s -C "attempts,won,target_number"
  ```

  Saves results to `./environments/my_game_env/outputs/evals/` including custom state columns.
</Steps>

## Common Pitfalls

<Warning>
  **Don't override `rollout()`** — The base implementation handles the core loop correctly. Override specific methods like `env_response()`, `setup_state()`, and stop conditions instead.
</Warning>

<Warning>
  **Return new messages, don't mutate** — `env_response()` should return a list of *new* messages to append, not modify existing messages.
</Warning>

<Warning>
  **Make cleanup idempotent** — Cleanup methods may be called multiple times or when resources are in unexpected states. Handle errors gracefully.
</Warning>

## Next Steps

* **Add tools**: Give your environment access to external functions → [Tool Environments Guide](/guides/tool-environments)
* **Custom patterns**: Advanced multi-turn patterns → [Custom Environments Guide](/guides/custom-environments)
* **Training**: Use your environment for RL training → [Training Guide](/guides/training)
