> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt
> Use this file to discover all available pages before exploring further.

# TextArena Integration

> Wrap TextArena game environments for use with Verifiers

The `TextArenaEnv` integration wraps [TextArena](https://github.com/LeonGuertler/TextArena) text-based game environments for multi-turn interaction with language models.

TextArena provides competitive and collaborative text-based games designed for LLM evaluation.

## Features

* **Text-based games** - Wordle, 20 Questions, Poker, and more
* **Multi-turn interaction** - Games require multiple model responses
* **Efficient memory sharing** - Optimized for parallel rollouts
* **Custom feedback** - Transform game observations for better prompting
* **XML formatting** - Built-in parser for structured responses

## Installation

Install with TextArena support:

```bash theme={null}
uv add 'verifiers[ta]'
```

This installs:

* `textarena` - TextArena game library
* `nltk` - Natural language processing (for word games)

## Quick Start

<Steps>
  <Step title="Create environment">
    Create a basic Wordle environment:

    ```python theme={null}
    import verifiers as vf
    from verifiers.envs.integrations.textarena_env import TextArenaEnv

    def load_environment():
        return TextArenaEnv(
            game="Wordle-v0",
            num_train_examples=1000,
            num_eval_examples=100,
            seed=0,
        )
    ```
  </Step>

  <Step title="Evaluate">
    Run an evaluation:

    ```bash theme={null}
    prime eval run my-wordle-env -m openai/gpt-4.1-mini -n 20
    ```
  </Step>
</Steps>

## Available Games

TextArena provides several game types:

### Word Games

* `Wordle-v0` - Classic Wordle game
* `WordChain-v0` - Word association chains
* `Scrabble-v0` - Scrabble with simplified rules

### Logic Games

* `TwentyQuestions-v0` - Guess the object
* `Mastermind-v0` - Code-breaking game

### Strategy Games

* `Chess-v0` - Text-based chess
* `Go-v0` - Text-based Go
* `Poker-v0` - Texas Hold'em

See the [TextArena repository](https://github.com/LeonGuertler/TextArena) for the full list.

## Configuration

### Basic Configuration

```python theme={null}
env = TextArenaEnv(
    game="Wordle-v0",
    num_train_examples=1000,
    num_eval_examples=200,
    seed=0,
)
```

### Custom Parser

By default, TextArena uses `XMLParser` with `<think>` and `<guess>` fields:

```python theme={null}
custom_parser = vf.XMLParser(
    fields=["reasoning", "action"],
    answer_field="action"
)

env = TextArenaEnv(
    game="Wordle-v0",
    parser=custom_parser,
    num_train_examples=1000,
)
```

### Custom System Prompt

```python theme={null}
env = TextArenaEnv(
    game="Wordle-v0",
    system_prompt="You are an expert Wordle player. Make strategic guesses based on the feedback.",
    num_train_examples=1000,
)
```

### Custom Feedback Function

TextArena games return full game state, but you may want to render only the delta. Use `feedback_fn` to transform observations:

```python theme={null}
def format_feedback(observation: str) -> str:
    """Extract only the latest feedback from full game state."""
    lines = observation.split("\n")
    # Find the most recent guess feedback
    for line in reversed(lines):
        if "Feedback:" in line:
            return line
    return observation

env = TextArenaEnv(
    game="Wordle-v0",
    feedback_fn=format_feedback,
    num_train_examples=1000,
)
```

<Note>
  Verifiers doesn't allow overwriting past messages—only appending. TextArena games often return full game state rather than turn-level diffs, so `feedback_fn` is useful for rendering clean, incremental feedback.
</Note>

### Custom Rubric

By default, the game's built-in reward is used. Override with a custom rubric:

```python theme={null}
async def win_bonus(state: vf.State) -> float:
    """Extra reward for winning quickly."""
    if state.get("reward", 0) > 0.9:  # Won the game
        turns = len(state.get("trajectory", []))
        return 1.0 / turns  # More reward for fewer turns
    return 0.0

rubric = vf.Rubric(funcs=[win_bonus])

env = TextArenaEnv(
    game="Wordle-v0",
    rubric=rubric,
    num_train_examples=1000,
)
```

## Full Example

```python theme={null}
import verifiers as vf
from verifiers.envs.integrations.textarena_env import TextArenaEnv

def render_wordle_feedback(observation: str) -> str:
    """Format Wordle feedback for better readability."""
    lines = observation.split("\n")
    feedback_lines = []
    
    for line in lines:
        if "Guess" in line or "Feedback" in line:
            feedback_lines.append(line)
    
    if not feedback_lines:
        return observation
    
    # Return only the most recent guess and feedback
    return "\n".join(feedback_lines[-2:])

def load_environment(
    game: str = "Wordle-v0",
    num_train_examples: int = 1000,
    num_eval_examples: int = 100,
    seed: int = 0,
) -> vf.Environment:
    """Load a TextArena environment.
    
    Args:
        game: TextArena game ID
        num_train_examples: Number of training examples
        num_eval_examples: Number of eval examples  
        seed: Random seed for word selection
    """
    parser = vf.XMLParser(
        fields=["think", "guess"],
        answer_field="guess"
    )
    
    return TextArenaEnv(
        game=game,
        num_train_examples=num_train_examples,
        num_eval_examples=num_eval_examples,
        parser=parser,
        system_prompt="You are playing Wordle. Think through your strategy, then make a guess.",
        feedback_fn=render_wordle_feedback,
        seed=seed,
    )
```

## Expected Format

Models should respond with XML-formatted guesses:

```xml theme={null}
<think>
Based on the feedback:
- 'A' is in the word but wrong position
- 'E' is not in the word
- 'S' is in the word and correct position

I'll try "STAIN" next.
</think>

<guess>
STAIN
</guess>
```

## Performance Optimization

`TextArenaEnv` includes memory optimization for parallel rollouts:

```python theme={null}
shared_memo = TextArenaEnv.build_shared_memo(ta_env)
```

This shares immutable data (like English dictionary word lists) across environment copies, saving \~38MB and \~120ms per rollout. This is handled automatically.

## Game-Specific Notes

### Wordle

* Words are randomly selected from the TextArena word list
* Default max turns: 6
* Reward is based on number of guesses (fewer is better)

### TwentyQuestions

* Model asks yes/no questions to guess the object
* Limited to 20 questions
* Reward for correct guess within question limit

### Chess

* Moves in algebraic notation (e.g., "e2e4")
* Game state includes board representation
* Reward based on game outcome

## Metrics

| Metric          | Meaning                                |
| --------------- | -------------------------------------- |
| `reward`        | Game reward (task-specific)            |
| `num_turns`     | Number of turns taken                  |
| `format_reward` | XML format compliance (if parser used) |

## Best Practices

<Note>
  When wrapping new TextArena games, investigate the source code to understand the observation format. Many games return full state rather than turn-level diffs.
</Note>

* **Use feedback\_fn** - Transform full-state observations to incremental feedback
* **Test locally first** - Try a few games manually to understand difficulty
* **Validate parsing** - Ensure your parser extracts the right fields
* **Custom prompts** - Game-specific instructions improve performance
* **Seed consistency** - Use same seed for reproducible experiments

## Troubleshooting

### NLTK Download Errors

TextArena uses NLTK for word games. If you see download errors, the environment handles this automatically. If issues persist:

```python theme={null}
import nltk
nltk.download('words', quiet=True)
nltk.download('averaged_perceptron_tagger_eng', quiet=True)
```

### Invalid Moves

If the model makes invalid moves (e.g., non-existent words in Wordle):

* Improve the system prompt with game rules
* Add examples of valid moves in few-shot prompts
* Use a more capable model

### Memory Issues

For large-scale parallel rollouts:

* The environment automatically shares immutable data
* If still seeing issues, reduce `num_train_examples`
* Consider running evaluation in batches

## Examples

See the [gem-wordle](https://github.com/PrimeIntellect-ai/verifiers/tree/main/environments/gem_wordle) example in the Verifiers repository for a complete implementation.

## Further Reading

* [TextArena Repository](https://github.com/LeonGuertler/TextArena)
* [TextArena Paper](https://arxiv.org/abs/2410.05895)