> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt
> Use this file to discover all available pages before exploring further.

# prime eval tui

> View evaluation results in an interactive terminal UI

## Overview

The `prime eval tui` command launches an interactive terminal user interface for browsing evaluation results. It automatically discovers and displays all evaluation runs from your workspace.

## Usage

```bash theme={null}
prime eval tui [OPTIONS]
```

## Options

No required options. The TUI will auto-discover results from standard locations.

## What It Shows

The TUI provides a hierarchical browser:

1. **Environment selection** - All environments with completed evaluations
2. **Model selection** - All models evaluated for that environment
3. **Run selection** - All evaluation runs for that environment + model combo
4. **Rollout viewer** - Individual prompts, completions, and metrics

## Discovery

Results are discovered from:

* `./outputs/evals/` - Global output directory
* `./environments/*/outputs/evals/` - Per-environment output directories

Each run must have both:

* `results.jsonl` - Rollout data
* `metadata.json` - Evaluation metadata

## Navigation

### Environment Selection Screen

```
┌─ Select Environment ─────────────────────────────────┐
│ gsm8k - Models: 2, Runs: 5                          │
│ math-python - Models: 1, Runs: 3                    │
│ alphabet-sort - Models: 2, Runs: 4                  │
└──────────────────────────────────────────────────────┘

q: Quit  Enter: Select
```

**Keys:**

* `↑`/`↓` or `j`/`k` - Navigate
* `Enter` - Select environment
* `q` - Quit

### Model Selection Screen

```
┌─ Environment: gsm8k ─────────────────────────────────┐
│ Select Model                                         │
│ openai/gpt-4.1-mini - Runs: 3                       │
│ anthropic/claude-sonnet-4 - Runs: 2                 │
└──────────────────────────────────────────────────────┘

q: Quit  b: Back  Enter: Select
```

**Keys:**

* `↑`/`↓` - Navigate
* `Enter` - Select model
* `b` or `Backspace` - Go back
* `q` - Quit

### Run Selection Screen

```
┌─ Environment: gsm8k ─────────────────────────────────┐
│ Model: openai/gpt-4.1-mini                          │
│ Select Run                                           │
│ abc123 - 2026-03-03 14:23 | Reward: 0.867           │
│ def456 - 2026-03-02 10:15 | Reward: 0.823           │
│ ghi789 - 2026-03-01 16:47 | Reward: 0.891           │
└──────────────────────────────────────────────────────┘

┌─ Run Details ────────────────────────────────────────┐
│ Run ID: abc123                                       │
│ Environment: gsm8k                                   │
│ Model: openai/gpt-4.1-mini                          │
│ Avg reward: 0.867                                    │
│ Runtime: 2m 34.5s                                    │
└──────────────────────────────────────────────────────┘

q: Quit  b: Back  Enter: Select
```

**Keys:**

* `↑`/`↓` - Navigate runs
* `Enter` - View rollout details
* `b` or `Backspace` - Go back
* `q` - Quit

### Rollout Viewer

The main screen shows prompts, completions, and metrics side-by-side:

```
┌─ Metadata ───────────────────────────────────────────┐
│ Environment: gsm8k      Record: 1/30                 │
│ Model: openai/gpt-4.1-mini                          │
│ Run ID: abc123          Examples: 10                 │
│ Date: 2026-03-03 14:23  Rollouts/ex: 3              │
└──────────────────────────────────────────────────────┘

┌─ Prompt ─────────────┬─ Completion ──────────────────┐
│ user: What is 2+2?   │ assistant: The answer is 4.   │
│                      │                                │
│                      │ tool call: calculate          │
│                      │ {"expression": "2+2"}         │
│                      │                                │
│                      │ tool result: 4                │
│                      │                                │
│                      │ assistant: The result is 4.   │
└──────────────────────┴────────────────────────────────┘

┌─ Details ────────────────────────────────────────────┐
│ Reward: 1.000                                        │
│ Answer: 4                                            │
│ Info: {"calculation": "2+2=4"}                      │
└──────────────────────────────────────────────────────┘

q: Quit  b: Back  ←/→: Prev/Next  s: Search  c: Copy
```

**Keys:**

* `←`/`→` or `h`/`l` - Navigate between rollouts
* `s` - Search prompt/completion text
* `c` - Enter copy mode
* `b` or `Backspace` - Go back to run list
* `q` - Quit
* `d` - Toggle dark/light theme

### Search Mode

Press `s` to search within prompts and completions:

```
┌─ Search (regex, case-insensitive) ───────────────────┐
│ [calculate               ]                           │
└──────────────────────────────────────────────────────┘

┌─ Prompt results (0) ────┬─ Completion results (2) ───┐
│                          │   245 | tool call: calculate│
│                          │   312 | assistant: The calcu│
└──────────────────────────┴────────────────────────────┘

Esc: Close  Enter: Select  ←/→: Switch column
```

**Keys:**

* Type to search (regex supported)
* `↑`/`↓` - Navigate results
* `←`/`→` - Switch between prompt and completion results
* `Enter` - Jump to selected match
* `Esc` - Close search

Matches are highlighted for 3 seconds after selection.

### Copy Mode

Press `c` to enter copy mode:

```
┌─ Copy Mode ──────────────────────────────────────────┐
│ Tab: switch columns                                  │
│ Highlight text with mouse drag or Shift+Arrow       │
│ Esc: close                                           │
└──────────────────────────────────────────────────────┘

┌─ Prompt ─────────────┬─ Completion ──────────────────┐
│ user: What is 2+2?   │ assistant: The answer is 4.   │
│                      │                                │
│ [Text is selectable] │ [Text is selectable]          │
└──────────────────────┴────────────────────────────────┘

q: Quit  Tab: Next column  c: Copy  Esc: Close
```

**Keys:**

* `Tab` - Switch between prompt and completion
* Mouse drag or `Shift+Arrow` - Select text
* `c` - Copy selected text to clipboard
* `Esc` - Exit copy mode
* `q` - Quit

## Display Features

### Message Formatting

Messages are formatted with role-based styling:

* **user messages** - Standard text
* **assistant messages** - `assistant:` prefix in bold
* **tool calls** - `tool call:` prefix with function name and arguments
* **tool results** - `tool result:` prefix in dimmed style
* **errors** - Red text with `error:` prefix

### Metrics Display

The details panel shows:

* **Reward** - Scalar reward from rubric (formatted to 3 decimals)
* **Answer** - Ground truth answer from task
* **Info** - Additional environment-specific data (formatted as JSON)
* **Task** - Full task data if available

### Lazy Loading

Results are loaded lazily for performance:

* File handles opened on-demand
* Lines read as needed
* Metadata count used when available
* Caching for already-read records

This allows the TUI to handle evaluations with thousands of rollouts efficiently.

## Themes

Toggle between dark and light themes with `d`:

* **black-warm** (default) - Dark theme with warm accent colors
* **white-warm** - Light theme with matching warm tones

## Examples

### Basic Usage

```bash theme={null}
# Run an evaluation
prime eval run gsm8k -m gpt-4.1-mini -n 10 -s

# Launch TUI
prime eval tui
```

### View Specific Results

The TUI automatically finds all results, so just launch it:

```bash theme={null}
prime eval tui
```

Navigate to your environment → model → run.

### Search for Patterns

1. Launch TUI and navigate to a run
2. Press `s` to open search
3. Type a regex pattern (e.g., `error|failed`)
4. Navigate results with arrow keys
5. Press `Enter` to jump to a match

### Copy Completions

1. Navigate to a rollout
2. Press `c` to enter copy mode
3. Tab to completion column
4. Select text with mouse or Shift+Arrow
5. Press `c` to copy to clipboard

## File Locations

Results are saved by `prime eval run --save-results` to:

```
./outputs/evals/
└── gsm8k--openai--gpt-4.1-mini/
    └── abc123/
        ├── results.jsonl
        └── metadata.json
```

Or per-environment:

```
./environments/gsm8k/outputs/evals/
└── gsm8k--openai--gpt-4.1-mini/
    └── abc123/
        ├── results.jsonl
        └── metadata.json
```

The TUI scans both locations.

## Performance

The TUI is optimized for large evaluations:

* **Lazy file reading** - Only loads visible data
* **Incremental parsing** - Reads JSONL line-by-line
* **Metadata caching** - Avoids re-parsing metadata files
* **Efficient rendering** - Textual's virtual DOM

Evaluations with 10,000+ rollouts are handled smoothly.

## Troubleshooting

### No Evaluations Found

```
┌─ Select Environment ─────────────────────────────────┐
│ No completed evals found                            │
└──────────────────────────────────────────────────────┘
```

**Solution:** Run an evaluation with `--save-results`:

```bash theme={null}
prime eval run gsm8k -m gpt-4.1-mini -n 5 -s
prime eval tui
```

### Corrupted Results

If results.jsonl is malformed, that rollout will show as `{}`.

**Solution:** Check the file manually:

```bash theme={null}
cat outputs/evals/gsm8k--openai--gpt-4.1-mini/abc123/results.jsonl | jq
```

### Terminal Size

If the TUI appears cramped, resize your terminal:

```bash theme={null}
resize -s 40 160  # 40 rows, 160 columns
prime eval tui
```

Recommended minimum: 24 rows × 80 columns

### Search Not Working

Search uses regex with case-insensitive matching. Test your pattern:

```bash theme={null}
echo "test string" | grep -iE 'pattern'
```

If the pattern is invalid, an error appears below the search box.

## Keyboard Reference

### Global

* `q` - Quit application
* `d` - Toggle dark/light theme

### Navigation Screens

* `↑`/`↓` or `j`/`k` - Move selection
* `Enter` - Select item
* `b` or `Backspace` - Go back one screen

### Rollout Viewer

* `←`/`→` or `h`/`l` - Previous/next rollout
* `s` - Open search
* `c` - Enter copy mode
* `b` or `Backspace` - Return to run list

### Search Mode

* Type - Enter search pattern
* `↑`/`↓` - Navigate results
* `←`/`→` - Switch prompt/completion
* `Enter` - Jump to selected match
* `Esc` - Close search

### Copy Mode

* `Tab` or `Shift+Tab` - Switch column
* Mouse drag - Select text
* `Shift+Arrow` - Select text (keyboard)
* `c` - Copy to clipboard
* `Esc` - Exit copy mode

## Tips

* Use search (`s`) to quickly find errors or specific patterns
* Copy mode (`c`) allows extracting full completions for analysis
* Results persist across runs - view historical evaluations anytime
* The TUI works great with tmux/screen for remote evaluation monitoring
* Use `--state-columns` when running evals to save additional fields visible in the TUI