> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom Multi-Turn Patterns

> Advanced environment customization for complex interaction patterns

When standard environment types don't fit your use case, `MultiTurnEnv` provides full control over the rollout loop. This guide covers advanced patterns for building custom environments with complex interaction logic.

## When to Customize

Use custom multi-turn environments when you need:

* **Complex game logic** — Board games, simulations, strategy games
* **Non-linear conversations** — State-dependent message assembly
* **Custom feedback loops** — Environment responses based on intermediate state
* **Specialized stop conditions** — Domain-specific termination logic
* **Advanced state management** — Complex per-rollout initialization and cleanup

## The Rollout Loop

Understanding the rollout loop is essential for customization:

```python theme={null}
class MultiTurnEnv(vf.Environment):
    @final
    async def rollout(self, input, client, model, sampling_args) -> State:
        state = await self.init_state(input, client, model, sampling_args)
        
        try:
            try:
                state = await self.setup_state(state)  # 1. Initialize
            except vf.Error as e:
                state["error"] = e
            
            # 2. Main loop
            while not await self.is_completed(state):
                try:
                    prompt_messages = await self.get_prompt_messages(state)
                    if state.get("final_env_response") is not None:
                        continue
                    response = await self.get_model_response(state, prompt_messages)
                    await self.add_model_response(state, prompt_messages, response)
                except vf.Error as e:
                    state["error"] = e
            
            # 3. Finalize
            await self.render_completion(state)
            return state
        finally:
            await self._cleanup(state)  # 4. Cleanup
```

<Warning>
  **Never override `rollout()`** — It's marked `@final` for a reason. Override specific methods instead:

  * `setup_state()` — Per-rollout initialization
  * `env_response()` — Environment feedback after each turn
  * `get_prompt_messages()` — Custom message assembly
  * `render_completion()` — Final conversation rendering
  * `add_trajectory_step()` — Trajectory metadata
</Warning>

## Core Methods to Override

### env\_response(): Required

Defines how the environment responds after each model turn:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        """Generate environment response after each model turn."""
        # Parse the model's action
        parsed = self.parser.parse(messages)
        action = parsed.action
        
        # Update game state
        state["board"] = apply_action(state["board"], action)
        
        # Check win condition
        if check_win(state["board"]):
            state["won"] = True
            return [{"role": "user", "content": "You won!"}]
        
        # Generate feedback
        feedback = generate_feedback(state["board"])
        return [{"role": "user", "content": feedback}]
```

Return value: List of **new** messages to append (don't mutate existing messages).

### setup\_state(): Optional

Initialize per-rollout resources:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def setup_state(self, state: vf.State) -> vf.State:
        """Initialize game state for this rollout."""
        # Initialize game board
        state["board"] = self.create_empty_board()
        state["score"] = 0
        state["moves"] = []
        
        # Setup external resources
        state["game_session"] = await self.game_api.create_session()
        
        return await super().setup_state(state)
```

<Tip>
  Always call `await super().setup_state(state)` at the end to ensure parent class initialization runs.
</Tip>

### get\_prompt\_messages(): Optional

Customize how messages are assembled for each turn:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def get_prompt_messages(self, state: vf.State) -> vf.Messages:
        """Assemble messages with current game state."""
        if len(state["trajectory"]) == 0:
            # First turn
            return [
                {"role": "system", "content": self.system_prompt},
                {"role": "user", "content": self.format_initial_board(state["board"])}
            ]
        
        # Subsequent turns: show conversation + current state
        messages = []
        messages.append({"role": "system", "content": self.system_prompt})
        
        # Add conversation history
        for turn in state["trajectory"]:
            messages.extend(turn["completion"])
        
        # Add environment response with current board
        env_response = await self.env_response(messages, state)
        messages.extend(env_response)
        
        return messages
```

### render\_completion(): Optional

Customize how the final conversation is assembled:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def render_completion(self, state: vf.State):
        """Assemble final completion with game summary."""
        if len(state["trajectory"]) == 0:
            state["completion"] = []
            return
        
        # Get last turn's messages
        last_prompt = state["trajectory"][-1]["prompt"]
        last_completion = state["trajectory"][-1]["completion"]
        
        # Build full conversation
        full_conversation = last_prompt + last_completion
        
        # Add final summary if game ended
        if state.get("final_env_response"):
            full_conversation.extend(state["final_env_response"])
        
        # Extract completion (everything after initial prompt)
        state["completion"] = full_conversation[len(state["prompt"]):]
```

### add\_trajectory\_step(): Optional

Add metadata to each turn:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def add_trajectory_step(
        self, 
        state: vf.State, 
        trajectory_step: TrajectoryStep
    ):
        """Enrich trajectory with game-specific metadata."""
        # Add game state snapshot
        trajectory_step["extras"]["board_state"] = state["board"].copy()
        trajectory_step["extras"]["valid_moves"] = get_valid_moves(state["board"])
        trajectory_step["extras"]["score"] = state["score"]
        
        # Set intermediate reward (optional)
        if state.get("won"):
            trajectory_step["reward"] = 1.0
        elif state.get("lost"):
            trajectory_step["reward"] = 0.0
        
        await super().add_trajectory_step(state, trajectory_step)
```

## Stop Conditions

Define when rollouts should terminate using the `@vf.stop` decorator:

### Basic Stop Conditions

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    @vf.stop
    async def game_won(self, state: vf.State) -> bool:
        return state.get("won", False)
    
    @vf.stop
    async def game_lost(self, state: vf.State) -> bool:
        return state.get("lives", 3) <= 0
    
    @vf.stop
    async def timeout(self, state: vf.State) -> bool:
        elapsed = time.time() - state["start_time"]
        return elapsed > self.max_seconds
```

Built-in stop conditions (always available):

* `has_error` — stops if `state["error"]` is set
* `max_turns_reached` — stops after `max_turns` iterations
* `prompt_too_long` — stops if prompt exceeds model context
* `has_final_env_response` — stops if early termination signaled

### Priority-Based Execution

Control evaluation order with priorities (higher runs first):

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    @vf.stop(priority=100)  # Check error first
    async def fatal_error(self, state: vf.State) -> bool:
        return state.get("fatal_error") is not None
    
    @vf.stop(priority=10)  # Then check cheap conditions
    async def answer_keyword(self, state: vf.State) -> bool:
        completion = state.get("completion", [])
        if not completion:
            return False
        return "FINAL ANSWER:" in completion[-1].get("content", "")
    
    @vf.stop(priority=-10)  # Finally check expensive conditions
    async def validated_answer(self, state: vf.State) -> bool:
        return await self.validator_api.check_answer(state)
```

### Early Termination from env\_response

Signal completion directly from the environment:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        # Process move
        result = process_move(messages, state)
        
        # Check if game ended
        if result.game_over:
            final_message = [
                {"role": "user", "content": f"Game over! Score: {state['score']}"}
            ]
            state["final_env_response"] = final_message
            return final_message
        
        # Game continues
        return [{"role": "user", "content": result.feedback}]
```

Setting `state["final_env_response"]` triggers the `has_final_env_response` stop condition.

## Resource Management

### Cleanup: Per-Rollout

Use `@vf.cleanup` for per-rollout resource cleanup:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    @vf.cleanup
    async def save_game_log(self, state: vf.State):
        """Save game results after each rollout."""
        try:
            await self.db.insert_game_result({
                "game_id": state["game_id"],
                "score": state.get("score", 0),
                "won": state.get("won", False),
                "moves": state.get("moves", []),
            })
        except Exception as e:
            self.logger.error(f"Failed to save game log: {e}")
    
    @vf.cleanup
    async def close_game_session(self, state: vf.State):
        """Close API session."""
        if "game_session" in state:
            try:
                await self.game_api.close_session(state["game_session"])
            except Exception as e:
                self.logger.warning(f"Failed to close session: {e}")
```

### Teardown: Environment Shutdown

Use `@vf.teardown` for environment-level cleanup:

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.db_connection = None
    
    async def setup_state(self, state: vf.State) -> vf.State:
        # Initialize DB connection lazily
        if self.db_connection is None:
            self.db_connection = await connect_to_db()
        return await super().setup_state(state)
    
    @vf.teardown
    async def close_database(self):
        """Close database connection when environment shuts down."""
        if self.db_connection:
            await self.db_connection.close()
            self.logger.info("Database connection closed")
```

<Warning>
  **Idempotency is critical** — Cleanup methods may be called multiple times or when resources are in unexpected states. Always:

  * Check if resources exist before cleaning up
  * Handle exceptions gracefully
  * Use try/except blocks
  * Log errors but don't raise
</Warning>

## Error Handling

Verifiers provides structured error handling:

### Error Hierarchy

```python theme={null}
vf.Error                    # Base class
├── vf.ModelError          # Model interaction issues
│   └── vf.EmptyModelResponseError
├── vf.OverlongPromptError # Prompt exceeds context
├── vf.ToolError           # Tool-related errors
│   ├── vf.ToolParseError  # Failed to parse tool call
│   └── vf.ToolCallError   # Tool execution failed
└── vf.InfraError          # Infrastructure failures
    ├── vf.SandboxError
    └── vf.TunnelError
```

### Raising Errors

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        try:
            result = await self.game_api.make_move(state["game_id"], move)
            return [{"role": "user", "content": result.feedback}]
        except GameAPITimeout as e:
            # Infrastructure error - rollout will stop
            raise vf.InfraError(f"Game API timeout: {e}") from e
        except InvalidMoveError as e:
            # Invalid move - let model recover
            return [{"role": "user", "content": f"Invalid move: {e}"}]
```

When a `vf.Error` is raised:

1. Automatically caught by the rollout loop
2. Stored in `state["error"]`
3. Built-in `has_error` stop condition triggers
4. Rollout terminates gracefully

## Complete Example: Tic-Tac-Toe

Here's a complete custom environment:

```python theme={null}
import verifiers as vf
from datasets import Dataset
import numpy as np

class TicTacToeEnv(vf.MultiTurnEnv):
    def __init__(self, **kwargs):
        super().__init__(max_turns=9, **kwargs)
    
    async def setup_state(self, state: vf.State) -> vf.State:
        """Initialize empty board."""
        state["board"] = np.zeros((3, 3), dtype=int)
        state["current_player"] = 1  # 1 = X (model), -1 = O (environment)
        state["winner"] = None
        return await super().setup_state(state)
    
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        """Process model's move and make counter-move."""
        # Parse model's move
        last_msg = messages[-1]["content"]
        parsed = self.parser.parse(messages)
        
        try:
            row, col = int(parsed.row), int(parsed.col)
            if not (0 <= row < 3 and 0 <= col < 3):
                return [{"role": "user", "content": "Invalid position. Use 0-2 for row and col."}]
            if state["board"][row, col] != 0:
                return [{"role": "user", "content": "That position is taken. Try again."}]
        except (ValueError, AttributeError):
            return [{"role": "user", "content": "Invalid format. Use <row>0</row><col>0</col>."}]
        
        # Apply model's move
        state["board"][row, col] = 1
        
        # Check win/draw
        if self.check_winner(state["board"]) == 1:
            state["winner"] = "model"
            return [{"role": "user", "content": f"You win!\n{self.render_board(state['board'])}"}]
        
        if np.all(state["board"] != 0):
            state["winner"] = "draw"
            return [{"role": "user", "content": f"Draw!\n{self.render_board(state['board'])}"}]
        
        # Environment's move (simple strategy)
        env_row, env_col = self.make_env_move(state["board"])
        state["board"][env_row, env_col] = -1
        
        # Check if environment won
        if self.check_winner(state["board"]) == -1:
            state["winner"] = "environment"
            return [{"role": "user", "content": f"I win!\n{self.render_board(state['board'])}"}]
        
        # Game continues
        return [{"role": "user", "content": self.render_board(state["board"])}]
    
    @vf.stop
    async def game_ended(self, state: vf.State) -> bool:
        return state.get("winner") is not None
    
    def check_winner(self, board):
        """Check for winner. Returns 1 (X wins), -1 (O wins), or 0 (no winner)."""
        # Check rows, cols, diagonals
        for i in range(3):
            if abs(board[i, :].sum()) == 3:
                return board[i, 0]
            if abs(board[:, i].sum()) == 3:
                return board[0, i]
        if abs(board.diagonal().sum()) == 3:
            return board[0, 0]
        if abs(np.fliplr(board).diagonal().sum()) == 3:
            return board[0, 2]
        return 0
    
    def make_env_move(self, board):
        """Simple strategy: take center, then corners, then edges."""
        # Take center if available
        if board[1, 1] == 0:
            return 1, 1
        # Take corners
        for r, c in [(0, 0), (0, 2), (2, 0), (2, 2)]:
            if board[r, c] == 0:
                return r, c
        # Take edges
        for r, c in [(0, 1), (1, 0), (1, 2), (2, 1)]:
            if board[r, c] == 0:
                return r, c
    
    def render_board(self, board):
        """Render board as string."""
        symbols = {0: ".", 1: "X", -1: "O"}
        lines = []
        for row in board:
            lines.append(" ".join(symbols[cell] for cell in row))
        return "\n".join(lines)

# Load environment
def load_environment():
    dataset = Dataset.from_list([
        {"prompt": [{"role": "user", "content": "Let's play tic-tac-toe. You are X. Make your move using <row>0</row><col>0</col> format."}]}
        for _ in range(100)
    ])
    
    parser = vf.XMLParser(["row", "col"])
    
    async def model_won(state) -> float:
        return 1.0 if state.get("winner") == "model" else 0.0
    
    async def draw_bonus(state) -> float:
        return 0.5 if state.get("winner") == "draw" else 0.0
    
    rubric = vf.Rubric(
        funcs=[model_won, draw_bonus],
        weights=[1.0, 1.0],
        parser=parser,
    )
    
    return TicTacToeEnv(dataset=dataset, parser=parser, rubric=rubric)
```

## Testing Custom Environments

<Steps>
  ### Unit test individual methods

  ```python theme={null}
  import pytest
  import verifiers as vf

  @pytest.mark.asyncio
  async def test_env_response():
      env = TicTacToeEnv(dataset=dataset, rubric=rubric)
      state = {"board": np.zeros((3, 3)), "current_player": 1}
      messages = [{"role": "assistant", "content": "<row>0</row><col>0</col>"}]
      
      response = await env.env_response(messages, state)
      assert len(response) == 1
      assert state["board"][0, 0] == 1
  ```

  ### Test with small evaluation

  ```bash theme={null}
  prime eval run tic-tac-toe -m gpt-4.1-mini -n 3 -r 2 -v
  ```

  ### Check state tracking

  ```bash theme={null}
  prime eval run tic-tac-toe -m gpt-4.1-mini -n 5 -s \
    -C "board,winner,current_player"
  ```

  Inspect saved results to verify state is tracked correctly.
</Steps>

## Best Practices

<Tip>
  **Start simple** — Build a minimal working version first, then add complexity incrementally.
</Tip>

<Tip>
  **Test stop conditions** — Ensure rollouts don't run forever. Add timeout conditions as a safety net.
</Tip>

<Tip>
  **Log liberally** — Use `self.logger` to log state transitions, decisions, and errors during development.
</Tip>

<Warning>
  **Don't mutate messages** — Always return new message lists from `env_response()`, never modify in place.
</Warning>

<Warning>
  **Handle all error cases** — Assume the model will send malformed responses. Validate and provide clear feedback.
</Warning>

## Next Steps

* **Evaluation**: Comprehensive testing strategies → [Evaluation Guide](/guides/evaluation)
* **Training**: Use custom environments for RL → [Training Guide](/guides/training)
* **Integration**: Connect to external systems → [Tool Environments Guide](/guides/tool-environments)
