> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt
> Use this file to discover all available pages before exploring further.

# Environments

> Understanding environment types and how they orchestrate RL training and evaluation

## Overview

Environments are the core abstraction in Verifiers that define how language models interact with tasks. Each environment orchestrates the full lifecycle of a **rollout**: loading data, managing model interactions, executing tools or game logic, and computing rewards.

```python theme={null}
import verifiers as vf

def load_environment():
    return vf.SingleTurnEnv(
        dataset=dataset,
        rubric=rubric,
        system_prompt="You are a helpful assistant."
    )
```

## Environment Hierarchy

All environments inherit from the abstract `Environment` base class and implement a `rollout()` method. The class hierarchy provides progressively more specialized interaction patterns:

```
Environment (abstract base)
├── SingleTurnEnv (single response Q&A)
└── MultiTurnEnv (multi-turn interactions)
    ├── ToolEnv (stateless tool calling)
    │   ├── StatefulToolEnv (tools with per-rollout state)
    │   │   ├── SandboxEnv (containerized bash execution)
    │   │   │   └── PythonEnv (persistent Python REPL)
    │   │   └── CliAgentEnv (custom agent code in sandboxes)
    │   └── MCPEnv (MCP server integration)
    └── Custom environments (games, simulations, etc.)
```

## Environment Types

### SingleTurnEnv

The simplest environment for single-response tasks where the model generates one completion per prompt.

```python theme={null}
import verifiers as vf
from datasets import Dataset

dataset = Dataset.from_list([
    {"prompt": [{"role": "user", "content": "What is 2+2?"}], "answer": "4"},
    {"prompt": [{"role": "user", "content": "What is 3*5?"}], "answer": "15"},
])

async def correct_answer(completion, answer) -> float:
    response = completion[-1]["content"]
    return 1.0 if answer in response else 0.0

rubric = vf.Rubric(funcs=[correct_answer])
env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
```

**Key characteristics:**

* One model response per rollout
* No environment feedback loop
* Perfect for Q\&A, classification, or completion tasks

### MultiTurnEnv

Enables multi-turn interactions where the environment responds after each model turn. Subclasses must implement `env_response()`.

```python theme={null}
class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(self, messages: vf.Messages, state: vf.State) -> vf.Messages:
        """Generate environment feedback after each model turn."""
        parsed = self.parser.parse(messages[-1]["content"])
        action = parsed.action
        result = self.process_action(action, state)
        return [{"role": "user", "content": result}]
```

**Built-in stop conditions:**

* `has_error` - Stops on any `vf.Error` in `state["error"]`
* `prompt_too_long` - Stops if prompt exceeds model context length
* `max_turns_reached` - Stops after `max_turns` iterations
* `has_final_env_response` - Stops when `state["final_env_response"]` is set

**Constructor parameters:**

```python theme={null}
MultiTurnEnv(
    dataset: Dataset,
    rubric: Rubric,
    max_turns: int = -1,  # -1 means unlimited
    **kwargs
)
```

### ToolEnv

Adds tool calling capabilities with stateless Python functions. Tools are automatically converted to OpenAI-compatible schemas.

```python theme={null}
async def calculate(expression: str) -> str:
    """Evaluate a mathematical expression.
    
    Args:
        expression: A mathematical expression to evaluate (e.g. "2 + 2 * 3")
    
    Returns:
        The result of the evaluation.
    """
    try:
        result = eval(expression)
        return str(result)
    except Exception as e:
        return f"Error: {e}"

env = vf.ToolEnv(
    dataset=dataset,
    tools=[calculate],
    rubric=rubric,
    max_turns=10
)
```

**Tool schema extraction:**

* Function name → tool name
* Type hints → parameter types
* Docstring → tool description and parameter descriptions

**Stop behavior:**

* Stops when model responds without tool calls (built-in `no_tools_called` condition)
* Configurable error handling via `stop_errors` parameter

### StatefulToolEnv

For tools that require per-rollout state (sandbox IDs, database connections, session handles).

```python theme={null}
class MySandboxEnv(vf.StatefulToolEnv):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # Register tool with hidden argument
        self.add_tool(self.run_code, args_to_skip=["session_id"])
    
    async def setup_state(self, state, **kwargs):
        # Initialize per-rollout resources
        state["session_id"] = await create_session()
        return await super().setup_state(state, **kwargs)
    
    def update_tool_args(self, tool_name, tool_args, messages, state, **kwargs):
        # Inject state into tool calls
        if tool_name == "run_code":
            tool_args["session_id"] = state["session_id"]
        return tool_args
    
    async def run_code(self, code: str, session_id: str) -> str:
        """Execute code in the sandbox."""
        return await execute_in_session(session_id, code)
```

**Pattern:**

1. Add tools with `args_to_skip` for hidden parameters
2. Initialize state in `setup_state()`
3. Inject state values in `update_tool_args()`

### SandboxEnv

Provides containerized bash execution using Prime Intellect's Sandboxes.

```python theme={null}
env = vf.SandboxEnv(
    dataset=dataset,
    rubric=rubric,
    sandbox_name="my-sandbox",
    docker_image="python:3.11-slim",
    start_command="tail -f /dev/null",
    cpu_cores=2,
    memory_gb=4,
    disk_size_gb=10,
    timeout_minutes=60,
    timeout_per_command_seconds=30,
    environment_vars={"API_KEY": "..."},
    labels=["experiment-1", "math-tasks"],  # optional categorization
)
```

**Built-in tool:**

* `bash(command: str)` - Execute shell commands in the sandbox

**Lifecycle:**

* Sandboxes are created in `setup_state()` (per rollout)
* Destroyed in cleanup handlers after each rollout
* All setup logic should be in `start_command`, not awaited until first use

### PythonEnv

Extends `SandboxEnv` with a persistent Python REPL.

```python theme={null}
env = vf.PythonEnv(
    dataset=dataset,
    rubric=rubric,
    packages=["numpy", "pandas"],  # auto-installed in sandbox
)
```

**Built-in tool:**

* `python(code: str)` - Execute Python code in the persistent REPL

### MCPEnv

Integrates with MCP (Model Context Protocol) servers.

```python theme={null}
mcp_servers = [
    {
        "name": "fetch",
        "command": "uvx",
        "args": ["mcp-server-fetch"],
    },
]

env = vf.MCPEnv(
    mcp_servers=mcp_servers,
    dataset=dataset,
    rubric=rubric,
)
```

**Features:**

* Automatically discovers and exposes MCP server tools
* Manages server lifecycle
* Supports multiple concurrent MCP servers

## Base Environment Class

### Constructor Parameters

All environments accept these common parameters:

```python theme={null}
Environment(
    dataset: Dataset | DatasetBuilder | None = None,
    eval_dataset: Dataset | DatasetBuilder | None = None,
    system_prompt: str | None = None,
    few_shot: Messages | None = None,
    parser: Parser | None = None,
    rubric: Rubric | None = None,
    sampling_args: SamplingArgs | None = None,
    max_workers: int = 512,
    env_id: str | None = None,
    env_args: dict | None = None,
    max_seq_len: int | None = None,
    score_rollouts: bool = True,
    pass_threshold: float = 0.5,
)
```

**Key parameters:**

* `dataset` / `eval_dataset` - Training and evaluation datasets (can be `DatasetBuilder` for lazy loading)
* `system_prompt` - Prepended to all prompts as a system message
* `few_shot` - Example messages inserted after system prompt
* `parser` - For extracting structured output (e.g., `vf.XMLParser`)
* `rubric` - Reward functions and scoring logic
* `sampling_args` - Default generation parameters (temperature, top\_p, etc.)
* `max_seq_len` - Maximum sequence length for tokenization
* `score_rollouts` - Whether to score rollouts (disable for pure generation)

### Core Methods

#### Generation

```python theme={null}
# Asynchronous generation
results = await env.generate(
    inputs=dataset,
    client=client,
    model="gpt-4",
    sampling_args={"temperature": 0.7},
    max_concurrent=10,
    save_results=True,
    results_path=Path("./results")
)

# Synchronous wrapper
results = env.generate_sync(inputs=dataset, client=client, model="gpt-4")
```

**Returns:** `GenerateOutputs` with `outputs` (list of `RolloutOutput`) and `metadata`

#### Evaluation

```python theme={null}
# Evaluate on eval_dataset
results = await env.evaluate(
    client=client,
    model="gpt-4",
    num_examples=100,
    rollouts_per_example=4,
    save_results=True
)

# Synchronous wrapper
results = env.evaluate_sync(client=client, model="gpt-4", num_examples=10)
```

#### Dataset Access

```python theme={null}
# Get datasets (triggers lazy loading if using DatasetBuilder)
train_ds = env.get_dataset(n=100, seed=42)
eval_ds = env.get_eval_dataset(n=50)
```

## Environment Groups

`EnvGroup` combines multiple environments for multi-task training:

```python theme={null}
math_env = vf.SingleTurnEnv(dataset=math_data, rubric=math_rubric)
code_env = vf.ToolEnv(dataset=code_data, tools=[execute_code], rubric=code_rubric)
reasoning_env = vf.MultiTurnEnv(dataset=reasoning_data, rubric=reasoning_rubric)

combined = vf.EnvGroup(
    envs=[math_env, code_env, reasoning_env],
    env_names=["math", "code", "reasoning"],  # optional
)
```

**Behavior:**

* Concatenates all sub-environment datasets
* Routes each rollout to the appropriate environment via `task` column
* Aggregates metrics across all environments

<Note>
  Environment groups are particularly useful for curriculum learning and multi-task RL training where you want to train a single model across diverse task types.
</Note>

## Advanced Customization

### Custom Stop Conditions

Define custom termination logic with the `@vf.stop` decorator:

```python theme={null}
class MyEnv(vf.MultiTurnEnv):
    @vf.stop(priority=10)  # Higher priority runs first
    async def answer_submitted(self, state: vf.State) -> bool:
        completion = state.get("completion", [])
        if not completion:
            return False
        return "FINAL ANSWER:" in completion[-1].get("content", "")
```

### Resource Management

Use lifecycle decorators for setup and cleanup:

```python theme={null}
class MyEnv(vf.MultiTurnEnv):
    async def setup_state(self, state: vf.State) -> vf.State:
        """Per-rollout initialization."""
        state["game_id"] = await create_game()
        return await super().setup_state(state)
    
    @vf.cleanup
    async def save_game_log(self, state: vf.State):
        """Per-rollout cleanup."""
        await save_log(state["game_id"])
    
    @vf.teardown
    async def close_connections(self):
        """Environment-level teardown."""
        await self.db.close()
```

<Warning>
  Cleanup methods must be idempotent (safe to call multiple times) and handle errors gracefully to ensure cleanup completes even when resources are in unexpected states.
</Warning>

### Signaling Early Termination

Set `state["final_env_response"]` to bypass model response and end the rollout:

```python theme={null}
async def env_response(self, messages: vf.Messages, state: vf.State) -> vf.Messages:
    if check_game_over(state):
        final_msg = [{"role": "user", "content": f"Game over! Score: {state['score']}"}]
        state["final_env_response"] = final_msg
        return final_msg
    # Normal response logic...
```

## Integration Examples

### TextArena Integration

Wrapper for text-based game environments:

```python theme={null}
env = vf.TextArenaEnv(
    game_name="rock_paper_scissors",
    num_players=2,
    dataset=dataset,
    rubric=rubric
)
```

### ReasoningGym Integration

Procedural reasoning tasks:

```python theme={null}
from verifiers.envs.integrations import DatasetSpec

env = vf.ReasoningGymEnv(
    dataset_spec=DatasetSpec(
        name="sorting",
        num_samples=100,
        difficulty="hard"
    ),
    rubric=rubric
)
```

### Browser Automation

Browserbase integration with DOM or vision-based control:

```python theme={null}
# DOM mode (natural language browser control)
env = vf.BrowserEnv(
    mode="dom",
    dataset=dataset,
    rubric=rubric
)

# CUA mode (coordinate-based vision control)
env = vf.BrowserEnv(
    mode="cua",
    use_sandbox=True,  # auto-deploy CUA server in sandbox
    dataset=dataset,
    rubric=rubric
)
```

<Info>
  See the [Integrations and Experimental Environments](/docs/environments#integrations-and-experimental-environments) section in the main environments guide for more details on third-party integrations.
</Info>
