> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt
> Use this file to discover all available pages before exploring further.

# Environment

> Base class for all RL environments

# Environment

Base abstract class for creating RL environments to train and evaluate LLMs.

## Overview

The `Environment` class provides the core infrastructure for:

* Managing datasets (training and evaluation)
* Running rollouts with LLM clients
* Scoring rollouts with rubrics
* Handling state lifecycle and cleanup
* Token usage tracking

All custom environments must inherit from this class and implement the `rollout()` method.

## Inheritance Hierarchy

```
Environment (abstract)
├── SingleTurnEnv
├── MultiTurnEnv
│   ├── ToolEnv
│   │   └── StatefulToolEnv
│   └── [Custom MultiTurn Environments]
└── EnvGroup
```

## Constructor

```python theme={null}
Environment(
    dataset: Dataset | DatasetBuilder | None = None,
    eval_dataset: Dataset | DatasetBuilder | None = None,
    system_prompt: str | None = None,
    few_shot: Messages | None = None,
    parser: Parser | None = None,
    rubric: Rubric | None = None,
    sampling_args: SamplingArgs | None = None,
    message_type: MessageType | object = _MESSAGE_TYPE_UNSET,
    tool_defs: list[Tool] | None = None,
    max_workers: int = 512,
    env_id: str | None = None,
    env_args: dict | None = None,
    map_kwargs: dict = {},
    max_seq_len: int | None = None,
    score_rollouts: bool = True,
    pass_threshold: float = 0.5,
    **kwargs
)
```

### Parameters

<ParamField path="dataset" type="Dataset | DatasetBuilder | None">
  Training dataset or a callable that returns a dataset. Either `dataset` or `eval_dataset` must be provided.
</ParamField>

<ParamField path="eval_dataset" type="Dataset | DatasetBuilder | None">
  Evaluation dataset or a callable that returns a dataset.
</ParamField>

<ParamField path="system_prompt" type="str | None">
  System prompt to prepend to all conversations.
</ParamField>

<ParamField path="few_shot" type="Messages | None">
  Few-shot examples to include in prompts.
</ParamField>

<ParamField path="parser" type="Parser | None">
  Parser for extracting structured data from completions. Defaults to `Parser()`.
</ParamField>

<ParamField path="rubric" type="Rubric | None">
  Rubric for scoring rollouts. Defaults to `Rubric()`.
</ParamField>

<ParamField path="sampling_args" type="SamplingArgs | None">
  Default sampling arguments for generation (temperature, top\_p, etc.).
</ParamField>

<ParamField path="tool_defs" type="list[Tool] | None">
  Provider-agnostic tool definitions in `vf.Tool` format.
</ParamField>

<ParamField path="max_workers" type="int" default="512">
  Maximum number of worker threads for synchronous execution.
</ParamField>

<ParamField path="env_id" type="str | None">
  Unique identifier for this environment.
</ParamField>

<ParamField path="env_args" type="dict | None">
  Additional environment-specific arguments.
</ParamField>

<ParamField path="map_kwargs" type="dict" default="{}">
  Keyword arguments to pass to HuggingFace dataset `.map()` operations.
</ParamField>

<ParamField path="max_seq_len" type="int | None">
  Maximum sequence length for tokenization and truncation.
</ParamField>

<ParamField path="score_rollouts" type="bool" default="True">
  Whether to score rollouts using the rubric.
</ParamField>

<ParamField path="pass_threshold" type="float" default="0.5">
  Reward threshold for considering a rollout as "passed".
</ParamField>

## Core Methods

### rollout

```python theme={null}
async def rollout(
    input: RolloutInput,
    client: Client,
    model: str,
    sampling_args: SamplingArgs | None = None
) -> State
```

Run a single rollout for a given input. **Must be implemented by subclasses.**

<ParamField path="input" type="RolloutInput">
  Input data from the dataset containing prompt, answer, etc.
</ParamField>

<ParamField path="client" type="Client">
  LLM client for making API calls.
</ParamField>

<ParamField path="model" type="str">
  Model identifier (e.g., "gpt-4", "claude-3-5-sonnet").
</ParamField>

<ParamField path="sampling_args" type="SamplingArgs | None">
  Optional sampling arguments to override defaults.
</ParamField>

**Returns:** `State` - Final state after rollout completion.

### get\_model\_response

```python theme={null}
async def get_model_response(
    state: State,
    prompt: Messages | str,
    client: Client | None = None,
    model: str | None = None,
    tool_defs: list[Tool] | None = None,
    sampling_args: SamplingArgs | None = None
) -> Response
```

Get model response for a given prompt (chat or completion).

<ParamField path="state" type="State">
  Current rollout state.
</ParamField>

<ParamField path="prompt" type="Messages | str">
  Prompt as messages or string.
</ParamField>

<ParamField path="client" type="Client | None">
  Client to use (defaults to `state["client"]`).
</ParamField>

<ParamField path="model" type="str | None">
  Model to use (defaults to `state["model"]`).
</ParamField>

<ParamField path="tool_defs" type="list[Tool] | None">
  Tools available for this request (defaults to `state["tool_defs"]`).
</ParamField>

<ParamField path="sampling_args" type="SamplingArgs | None">
  Sampling arguments (defaults to `state["sampling_args"]`).
</ParamField>

**Returns:** `Response` - Model response with message, usage, etc.

### init\_state

```python theme={null}
async def init_state(
    input: RolloutInput,
    client: Client | ClientConfig,
    model: str,
    sampling_args: SamplingArgs | None = None
) -> State
```

Create initial state from dataset input. Called automatically at the start of each rollout.

<ParamField path="input" type="RolloutInput">
  Input data from the dataset.
</ParamField>

<ParamField path="client" type="Client | ClientConfig">
  Client or client configuration.
</ParamField>

<ParamField path="model" type="str">
  Model identifier.
</ParamField>

<ParamField path="sampling_args" type="SamplingArgs | None">
  Sampling arguments.
</ParamField>

**Returns:** `State` - Initialized state with input fields, client, model, etc.

## Dataset Methods

### build\_dataset

```python theme={null}
def build_dataset() -> Dataset | None
```

Build and cache the training dataset from source if needed.

**Returns:** `Dataset | None` - Built dataset or None if no source.

### build\_eval\_dataset

```python theme={null}
def build_eval_dataset() -> Dataset | None
```

Build and cache the evaluation dataset from source if needed.

**Returns:** `Dataset | None` - Built dataset or None if no source.

### get\_dataset

```python theme={null}
def get_dataset(n: int = -1, seed: int | None = None) -> Dataset
```

Get the training dataset, optionally shuffled and limited.

<ParamField path="n" type="int" default="-1">
  Maximum number of examples to return. -1 returns all.
</ParamField>

<ParamField path="seed" type="int | None">
  Random seed for shuffling.
</ParamField>

**Returns:** `Dataset` - Training dataset.

### get\_eval\_dataset

```python theme={null}
def get_eval_dataset(n: int = -1, seed: int | None = None) -> Dataset
```

Get the evaluation dataset, optionally shuffled and limited. Falls back to training dataset if no eval dataset exists.

<ParamField path="n" type="int" default="-1">
  Maximum number of examples to return. -1 returns all.
</ParamField>

<ParamField path="seed" type="int | None">
  Random seed for shuffling.
</ParamField>

**Returns:** `Dataset` - Evaluation dataset.

## Generation & Evaluation

### generate

```python theme={null}
async def generate(
    inputs: Dataset | List[RolloutInput],
    client: Client | ClientConfig,
    model: str,
    sampling_args: SamplingArgs | None = None,
    max_concurrent: int = -1,
    results_path: Path | None = None,
    state_columns: list[str] | None = None,
    save_results: bool = False,
    push_to_hf_hub: bool = False,
    hf_hub_dataset_name: str | None = None,
    independent_scoring: bool = False,
    max_retries: int = 0,
    on_start: StartCallback | None = None,
    on_progress: ProgressCallback | list[ProgressCallback] | None = None,
    on_log: LogCallback | None = None
) -> GenerateOutputs
```

Generate rollouts for a set of inputs.

<ParamField path="inputs" type="Dataset | List[RolloutInput]">
  Input examples to generate rollouts for.
</ParamField>

<ParamField path="client" type="Client | ClientConfig">
  LLM client or client configuration.
</ParamField>

<ParamField path="model" type="str">
  Model identifier.
</ParamField>

<ParamField path="sampling_args" type="SamplingArgs | None">
  Sampling arguments to override defaults.
</ParamField>

<ParamField path="max_concurrent" type="int" default="-1">
  Maximum concurrent rollouts. -1 for unlimited.
</ParamField>

<ParamField path="results_path" type="Path | None">
  Path to save/resume results.
</ParamField>

<ParamField path="state_columns" type="list[str] | None">
  Additional state fields to include in outputs.
</ParamField>

<ParamField path="save_results" type="bool" default="False">
  Whether to save results to disk.
</ParamField>

<ParamField path="push_to_hf_hub" type="bool" default="False">
  Whether to push results to HuggingFace Hub.
</ParamField>

<ParamField path="hf_hub_dataset_name" type="str | None">
  Dataset name for HuggingFace Hub.
</ParamField>

<ParamField path="independent_scoring" type="bool" default="False">
  Score rollouts independently vs. in groups.
</ParamField>

<ParamField path="max_retries" type="int" default="0">
  Maximum retries for failed rollouts.
</ParamField>

<ParamField path="on_start" type="StartCallback | None">
  Callback when generation starts.
</ParamField>

<ParamField path="on_progress" type="ProgressCallback | list[ProgressCallback] | None">
  Progress callback(s). None uses default tqdm progress bar.
</ParamField>

<ParamField path="on_log" type="LogCallback | None">
  Logging callback.
</ParamField>

**Returns:** `GenerateOutputs` - Dictionary with `outputs` and `metadata` keys.

### generate\_sync

```python theme={null}
def generate_sync(
    inputs: Dataset | List[RolloutInput],
    client: Client | ClientConfig,
    **kwargs
) -> GenerateOutputs
```

Synchronous wrapper for `generate()`. Handles event loop creation.

### evaluate

```python theme={null}
async def evaluate(
    client: Client | ClientConfig,
    model: str,
    sampling_args: SamplingArgs | None = None,
    num_examples: int = -1,
    rollouts_per_example: int = 1,
    max_concurrent: int = -1,
    results_path: Path | None = None,
    state_columns: list[str] | None = None,
    save_results: bool = False,
    push_to_hf_hub: bool = False,
    hf_hub_dataset_name: str | None = None,
    independent_scoring: bool = False,
    max_retries: int = 0,
    on_start: StartCallback | None = None,
    on_progress: ProgressCallback | list[ProgressCallback] | None = None,
    on_log: LogCallback | None = None,
    **kwargs
) -> GenerateOutputs
```

Evaluate model on the environment's evaluation dataset.

<ParamField path="client" type="Client | ClientConfig">
  LLM client or client configuration.
</ParamField>

<ParamField path="model" type="str">
  Model identifier.
</ParamField>

<ParamField path="num_examples" type="int" default="-1">
  Number of examples to evaluate. -1 for all.
</ParamField>

<ParamField path="rollouts_per_example" type="int" default="1">
  Number of rollouts to generate per example.
</ParamField>

Other parameters are the same as `generate()`.

**Returns:** `GenerateOutputs` - Dictionary with `outputs` and `metadata` keys.

### evaluate\_sync

```python theme={null}
def evaluate_sync(
    client: Client | ClientConfig,
    model: str,
    sampling_args: SamplingArgs | None = None,
    num_examples: int = -1,
    rollouts_per_example: int = 1,
    max_concurrent: int = -1,
    results_path: Path | None = None,
    state_columns: list[str] | None = None,
    save_results: bool = False,
    push_to_hf_hub: bool = False,
    hf_hub_dataset_name: str | None = None,
    independent_scoring: bool = False,
    max_retries: int = 0
) -> GenerateOutputs
```

Synchronous wrapper for `evaluate()`.

## Token Usage Tracking

### get\_state\_usage

```python theme={null}
def get_state_usage(state: State) -> TokenUsage | None
```

Get token usage statistics for a state.

<ParamField path="state" type="State">
  Rollout state.
</ParamField>

**Returns:** `TokenUsage | None` - Dictionary with `input_tokens` and `output_tokens` keys, or None.

### increment\_state\_usage

```python theme={null}
def increment_state_usage(
    state: State,
    input_tokens: int | float = 0,
    output_tokens: int | float = 0
) -> None
```

Manually increment token usage for a state.

### increment\_state\_usage\_from\_response

```python theme={null}
def increment_state_usage_from_response(
    state: State,
    response: object
) -> None
```

Extract and increment token usage from a response object.

## State Lifecycle

### is\_completed

```python theme={null}
async def is_completed(state: State, **kwargs) -> bool
```

Check all stop conditions. Sets `state["is_completed"] = True` if any condition is met.

<ParamField path="state" type="State">
  Current rollout state.
</ParamField>

**Returns:** `bool` - True if any stop condition is met.

## Configuration

### set\_kwargs

```python theme={null}
def set_kwargs(**kwargs) -> None
```

Set environment attributes using setter methods when available.

### add\_rubric

```python theme={null}
def add_rubric(rubric: Rubric) -> None
```

Add a rubric to the environment. Creates a `RubricGroup` if a rubric already exists.

### set\_max\_seq\_len

```python theme={null}
def set_max_seq_len(max_seq_len: int | None) -> None
```

Set the maximum sequence length.

### set\_score\_rollouts

```python theme={null}
def set_score_rollouts(score_rollouts: bool) -> None
```

Set whether to score rollouts.

## Server Methods

### start\_server

```python theme={null}
async def start_server(
    address: str | None = None,
    extra_env_kwargs: dict[str, Any] | None = None,
    log_level: str | None = None,
    log_file: str | None = None,
    log_file_level: str | None = None,
    health_check_interval: float = 1.0,
    startup_timeout: float = 600.0,
    recovery_timeout: float = 600.0
) -> None
```

<Warning>
  This method is subject to change. External users should avoid depending on it directly.
</Warning>

Start a ZMQ server process for distributed rollout execution.

### stop\_server

```python theme={null}
async def stop_server() -> None
```

<Warning>
  This method is subject to change. External users should avoid depending on it directly.
</Warning>

Stop the ZMQ server process.

## Static Methods

### make\_dataset

```python theme={null}
@staticmethod
def make_dataset(...) -> Dataset
```

Utility for creating HuggingFace datasets. See `verifiers.utils.save_utils.make_dataset` for details.

## Example Usage

```python theme={null}
import verifiers as vf
from datasets import load_dataset

# Create a simple environment
class MyEnv(vf.Environment):
    async def rollout(
        self,
        input: vf.RolloutInput,
        client: vf.Client,
        model: str,
        sampling_args: vf.SamplingArgs | None = None,
    ) -> vf.State:
        state = await self.init_state(input, client, model, sampling_args)
        
        # Get model response
        response = await self.get_model_response(
            state,
            prompt=state["prompt"]
        )
        
        # Store completion
        state["completion"] = response.message
        state["is_completed"] = True
        
        return state

# Load environment with dataset
def load_environment():
    dataset = load_dataset("gsm8k", "main", split="train")
    
    def reward_fn(answer: str, completion: vf.Messages) -> float:
        # Custom reward logic
        return 1.0 if answer in str(completion) else 0.0
    
    return MyEnv(
        dataset=dataset,
        rubric=vf.Rubric(reward_fn),
        system_prompt="You are a helpful assistant."
    )

# Evaluate
env = load_environment()
results = await env.evaluate(
    client=vf.ClientConfig(
        provider="openai",
        api_key="sk-..."
    ),
    model="gpt-4",
    num_examples=10
)

print(f"Average reward: {results['metadata']['avg_reward']}")
```

## See Also

* [SingleTurnEnv](/api/single-turn-env) - Single-turn Q\&A environments
* [MultiTurnEnv](/api/multi-turn-env) - Multi-turn interactive environments
* [ToolEnv](/api/tool-env) - Tool-calling environments
* [EnvGroup](/api/env-group) - Mixture of multiple environments
