> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt
> Use this file to discover all available pages before exploring further.

# Building Single-Turn Environments

> Create simple question-answering environments with the Verifiers SDK

Single-turn environments are the simplest type of environment in Verifiers, designed for tasks where the model provides a single response to each prompt. They're ideal for Q\&A tasks, math problems, text transformations, and other one-shot challenges.

## Overview

`SingleTurnEnv` is a specialized version of `MultiTurnEnv` with `max_turns=1`. Each rollout follows this simple pattern:

1. Send the prompt to the model
2. Receive a single response (the completion)
3. Score the response using reward functions

No multi-turn interaction, no tools, no complex state management—just prompt, response, and reward.

## Your First Environment

Here's a minimal single-turn environment for math problems:

```python theme={null}
import verifiers as vf
from datasets import Dataset

def load_environment():
    # Define your task data
    dataset = Dataset.from_list([
        {"prompt": [{"role": "user", "content": "What is 2+2?"}], "answer": "4"},
        {"prompt": [{"role": "user", "content": "What is 3*5?"}], "answer": "15"},
    ])
    
    # Define your reward function
    async def correct_answer(completion, answer) -> float:
        response = completion[-1]["content"]
        return 1.0 if answer in response else 0.0
    
    # Create rubric and environment
    rubric = vf.Rubric(funcs=[correct_answer])
    return vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
```

<Steps>
  ### Initialize Your Environment

  Create a new environment project:

  ```bash theme={null}
  prime env init my-math-env
  cd environments/my_math_env
  ```

  ### Build Your Dataset

  You can build datasets in several ways:

  <Tabs>
    <Tab title="Direct Prompts">
      ```python theme={null}
      from datasets import Dataset

      dataset = Dataset.from_list([
          {
              "prompt": [{"role": "user", "content": "What is 2+2?"}],
              "answer": "4"
          },
      ])
      ```

      The `prompt` field contains a list of messages ready to send to the model.
    </Tab>

    <Tab title="Question Column">
      ```python theme={null}
      dataset = Dataset.from_list([
          {"question": "What is 2+2?", "answer": "4"},
      ])
      ```

      The environment automatically wraps `question` strings in a user message.
    </Tab>

    <Tab title="From Hugging Face">
      ```python theme={null}
      from datasets import load_dataset

      dataset = load_dataset("gsm8k", "main", split="train")
      dataset = dataset.map(lambda x: {
          "question": x["question"],
          "answer": x["answer"],
      })
      ```

      Load existing datasets and map to the expected format.
    </Tab>
  </Tabs>

  ### Define Reward Functions

  Reward functions score model responses. They request data by naming arguments:

  ```python theme={null}
  async def correct_answer(completion, answer) -> float:
      """Check if the answer appears in the response."""
      response = completion[-1]["content"]
      return 1.0 if answer in response else 0.0
  ```

  Available arguments:

  * `completion` — model's output (list of messages)
  * `prompt` — input messages
  * `answer` — from dataset row
  * `info` — structured metadata from dataset
  * `state` — full rollout state

  ### Create Your Environment

  Combine everything in `load_environment()`:

  ```python theme={null}
  import verifiers as vf
  from datasets import load_dataset

  def load_environment():
      dataset = load_dataset("gsm8k", "main", split="train")
      
      async def correct_answer(completion, answer) -> float:
          response = completion[-1]["content"]
          return 1.0 if answer in response else 0.0
      
      rubric = vf.Rubric(funcs=[correct_answer])
      
      return vf.SingleTurnEnv(
          dataset=dataset,
          system_prompt="You are a helpful math tutor.",
          rubric=rubric,
      )
  ```

  ### Install and Test

  Install your environment and run a quick evaluation:

  ```bash theme={null}
  prime env install my-math-env
  prime eval run my-math-env -m gpt-4.1-mini -n 5
  ```

  Expected output:

  ```
  Running evaluation on my-math-env with gpt-4.1-mini
  Progress: 5/5 examples, 15/15 rollouts
  Reward: 0.87 ± 0.12
  ```
</Steps>

## Real Example: Text Reversal

Let's examine the `reverse-text` environment from the repository:

```python environments/reverse_text/reverse_text.py theme={null}
from datasets import load_dataset
import verifiers as vf

def load_environment(
    dataset_name: str = "PrimeIntellect/Reverse-Text-RL",
    dataset_split: str = "train",
    system_prompt: str | None = "Reverse the text character-by-character. Put your answer in <reversed_text> tags.",
) -> vf.Environment:
    train_dataset = load_dataset(dataset_name, split=dataset_split).map(
        lambda x: {
            "question": x["prompt"],
            "answer": x["prompt"][::-1],
            "info": {},
            "task": "reverse-text",
        }
    )
    train_dataset = train_dataset.remove_columns(["prompt"])

    parser = vf.XMLParser(["reversed_text"], answer_field="reversed_text")

    def lcs_reward_func(completion, answer, **kwargs) -> float:
        """LCS ratio of the reversed prompt and the parsed completion."""
        from difflib import SequenceMatcher
        response = parser.parse_answer(completion) or ""
        return SequenceMatcher(None, response, answer).ratio()

    rubric = vf.Rubric(funcs=[lcs_reward_func], weights=[1.0])

    return vf.SingleTurnEnv(
        dataset=train_dataset,
        system_prompt=system_prompt,
        parser=parser,
        rubric=rubric,
    )
```

Key features:

* Uses `XMLParser` to extract structured output from `<reversed_text>` tags
* Computes continuous reward based on longest common subsequence
* Allows customization via `system_prompt` parameter

## Advanced Patterns

### Multiple Reward Functions

Combine multiple scoring criteria with custom weights:

```python theme={null}
async def check_keywords(completion, info) -> float:
    """Check for required keywords."""
    response = completion[-1]["content"]
    keywords = info["required_keywords"]
    found = sum(1 for kw in keywords if kw.lower() in response.lower())
    return found / len(keywords)

async def length_reward(completion) -> float:
    """Reward concise responses."""
    response = completion[-1]["content"]
    return 1.0 if len(response) < 500 else 0.5

rubric = vf.Rubric(
    funcs=[check_keywords, length_reward],
    weights=[1.0, 0.1]  # keyword match is primary, length is secondary
)
```

The final reward is the weighted sum: `reward = 1.0 * check_keywords + 0.1 * length_reward`

### Parsing Structured Output

Use parsers to extract specific fields from model responses:

<CodeGroup>
  ```python XML Parser theme={null}
  parser = vf.XMLParser(["reasoning", "answer"], answer_field="answer")

  async def correct_with_reasoning(completion, answer, parser) -> float:
      parsed = parser.parse_answer(completion)
      # Access parsed.reasoning and parsed.answer
      return 1.0 if answer in parsed.answer else 0.0

  rubric = vf.Rubric(funcs=[correct_with_reasoning], parser=parser)
  vf_env = vf.SingleTurnEnv(dataset=dataset, parser=parser, rubric=rubric)
  ```

  ```python Custom Extract Function theme={null}
  def extract_boxed(completion):
      """Extract content from \boxed{...}."""
      import re
      match = re.search(r'\\boxed\{(.+?)\}', completion[-1]["content"])
      return match.group(1) if match else ""

  parser = vf.Parser(extract_fn=extract_boxed)
  rubric = vf.Rubric(funcs=[my_reward], parser=parser)
  ```
</CodeGroup>

### Lazy Dataset Loading

For large datasets, defer loading until first access:

```python theme={null}
from datasets import load_dataset
import verifiers as vf

def get_dataset_builder(split: str = "train", seed: int = 42):
    """Returns a builder that lazily loads the dataset."""
    def build():
        ds = load_dataset("my-dataset", split=split)
        ds = ds.shuffle(seed=seed)
        return ds
    return build

def load_environment():
    dataset_builder = get_dataset_builder(split="train")
    eval_builder = get_dataset_builder(split="test")
    
    return vf.SingleTurnEnv(
        dataset=dataset_builder,      # built on first access
        eval_dataset=eval_builder,    # built on first access
        rubric=rubric,
    )
```

Benefits:

* Avoid loading large datasets during environment initialization
* Better performance when running multiple replicas
* Parameterize dataset creation (splits, shuffling, filtering)

### Metrics and Observability

Track additional metrics without affecting the reward:

```python theme={null}
async def response_length(completion) -> float:
    return float(len(completion[-1]["content"]))

async def has_reasoning(completion) -> float:
    content = completion[-1]["content"]
    return 1.0 if "because" in content.lower() else 0.0

rubric = vf.Rubric(funcs=[correct_answer])  # only this affects reward
rubric.add_metric(response_length)          # weight=0 (tracking only)
rubric.add_metric(has_reasoning)            # weight=0 (tracking only)
```

All metrics appear in evaluation results:

```json theme={null}
{
  "reward": 0.8,
  "correct_answer": 0.8,
  "response_length": 127.3,
  "has_reasoning": 0.6
}
```

### Evaluation Datasets

Provide separate train and evaluation datasets:

```python theme={null}
def load_environment():
    train_dataset = load_dataset("my-dataset", split="train")
    eval_dataset = load_dataset("my-dataset", split="test")
    
    return vf.SingleTurnEnv(
        dataset=train_dataset,
        eval_dataset=eval_dataset,
        rubric=rubric,
    )
```

When you run `prime eval run`, the evaluation dataset is used automatically.

## Common Patterns

### Math Verification

Use symbolic math checking with the built-in `MathRubric`:

```python theme={null}
import verifiers as vf

def extract_boxed_answer(completion):
    import re
    match = re.search(r'\\boxed\{(.+?)\}', completion[-1]["content"])
    return match.group(1) if match else ""

parser = vf.Parser(extract_fn=extract_boxed_answer)
math_rubric = vf.MathRubric(parser=parser)  # Uses math-verify library

vf_env = vf.SingleTurnEnv(
    dataset=dataset,
    system_prompt="Solve the problem and put your answer in \\boxed{}.",
    parser=parser,
    rubric=math_rubric,
)
```

### LLM-as-Judge

Use another LLM to score responses:

```python theme={null}
import verifiers as vf

judge_rubric = vf.JudgeRubric(
    judge_model="gpt-4.1-mini",
    judge_prompt="""Is this response correct?
    
    Question: {question}
    Ground truth: {answer}
    Response: {response}
    
    Answer 'yes' or 'no'."""
)

async def judge_reward(prompt, completion, answer, judge) -> float:
    verdict = await judge(prompt, completion, answer)
    return 1.0 if "yes" in verdict.lower() else 0.0

judge_rubric.add_reward_func(judge_reward)

vf_env = vf.SingleTurnEnv(dataset=dataset, rubric=judge_rubric)
```

### Combining Multiple Rubrics

Use `RubricGroup` to combine different scoring approaches:

```python theme={null}
# Symbolic math verification
math_rubric = vf.MathRubric()

# LLM judge for reasoning quality
judge_rubric = vf.JudgeRubric(judge_model="gpt-4.1-mini")
judge_rubric.add_reward_func(judge_reasoning_quality, weight=0.5)

# Combine both
rubric = vf.RubricGroup([math_rubric, judge_rubric])

vf_env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
```

Final reward = math\_rubric.reward + judge\_rubric.reward

## Testing Your Environment

After implementing your environment:

<Steps>
  ### Install locally

  ```bash theme={null}
  prime env install my-env
  ```

  ### Run a quick evaluation

  ```bash theme={null}
  prime eval run my-env -m gpt-4.1-mini -n 10 -r 3
  ```

  This runs 10 examples with 3 rollouts each (30 total rollouts).

  ### Check the output

  Expected output:

  ```
  Loading environment: my-env
  Running 10 examples × 3 rollouts = 30 total rollouts
  Progress: ████████████████████ 30/30 (100%)

  Results:
    Reward: 0.73 ± 0.15
    correct_answer: 0.73 ± 0.15
    response_length: 142.3 ± 45.2
  ```

  ### Save and inspect results

  ```bash theme={null}
  prime eval run my-env -m gpt-4.1-mini -n 10 -s
  ```

  Results saved to `./environments/my_env/outputs/evals/my-env--gpt-4.1-mini/{run_id}/`:

  * `results.jsonl` - detailed rollout data
  * `metadata.json` - configuration and metrics
</Steps>

## Next Steps

* **Multi-turn environments**: Add turn-by-turn interaction → [Multi-Turn Guide](/guides/multi-turn)
* **Tool use**: Give your agent access to tools → [Tool Environments Guide](/guides/tool-environments)
* **Training**: Use your environment for RL training → [Training Guide](/guides/training)
