> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt
> Use this file to discover all available pages before exploring further.

# HarborEnv

> CliAgentEnv subclass for loading Harbor-format tasks

# HarborEnv

A specialized environment for running Harbor-format benchmark tasks with automatic task loading, sandbox management, and test execution.

<Warning>
  HarborEnv is experimental and subject to breaking changes. The API may change in future releases.
</Warning>

## Overview

`HarborEnv` extends `CliAgentEnv` to provide first-class support for [Harbor](https://github.com/evals/harbor)-format evaluation tasks. It automatically:

* Loads task specifications from `task.toml` and `instruction.md`
* Manages Docker-based sandboxes per task
* Uploads task assets and test suites
* Executes verification tests and computes rewards

## Inheritance

```
Environment
└── MultiTurnEnv
    └── CliAgentEnv
        └── HarborEnv
```

## Constructor

```python theme={null}
HarborEnv(
    run_command: str,
    dataset_path: str | Path,
    tasks: list[str] | None = None,
    agent_workdir: str = "/app",
    docker_image: str = "python:3.11-slim",
    **kwargs
)
```

<ParamField path="run_command" type="str" required>
  Command to execute the agent inside the sandbox (e.g., `"python agent.py"`).
</ParamField>

<ParamField path="dataset_path" type="str | Path" required>
  Path to directory containing Harbor task folders. Each task folder must contain `task.toml` and `instruction.md`.
</ParamField>

<ParamField path="tasks" type="list[str] | None" default="None">
  Specific task names to load. If None, loads all tasks found in `dataset_path`.
</ParamField>

<ParamField path="agent_workdir" type="str" default="/app">
  Working directory for the agent inside the sandbox. Set via `AGENT_WORKDIR` environment variable.
</ParamField>

<ParamField path="docker_image" type="str" default="python:3.11-slim">
  Default Docker image for sandboxes. Can be overridden per-task via `task.toml`.
</ParamField>

<ParamField path="**kwargs">
  Additional arguments passed to `CliAgentEnv` (timeout, resources, etc.). See [CliAgentEnv](/api/experimental/cli-agent-env) for details.
</ParamField>

## Key Methods

### load\_harbor\_dataset

```python theme={null}
def load_harbor_dataset(self) -> Dataset
```

Loads Harbor tasks from the dataset directory into a HuggingFace Dataset.

**Returns**: Dataset with columns:

* `example_id`: Sequential task ID
* `task`: Task name (directory name)
* `prompt`: Formatted instruction as messages
* `info`: Task metadata including `task_dir`, `docker_image`, and `config`

### get\_docker\_image

```python theme={null}
async def get_docker_image(self, state: vf.State) -> str
```

Resolves the Docker image for a task from `task.toml` or falls back to the default.

<ParamField path="state" type="vf.State">
  Rollout state containing task info.
</ParamField>

**Returns**: Docker image string

### build\_env\_vars

```python theme={null}
async def build_env_vars(self, state: vf.State) -> dict[str, str]
```

Builds environment variables with Harbor-specific additions:

* `HARBOR_TASK_NAME`: Current task name
* `HARBOR_TASK_DIR`: Path to task assets (`/task`)
* `HARBOR_INSTRUCTION_PATH`: Path to instruction file
* `AGENT_WORKDIR`: Agent working directory

### compute\_reward

```python theme={null}
async def compute_reward(self, state: vf.State) -> float
```

Executes Harbor test suite (`tests/test.sh`) and extracts reward from:

1. `/logs/verifier/reward.txt` (preferred)
2. `/logs/verifier/reward.json` (fallback)

**Returns**: Reward value between 0.0 and 1.0

## Harbor Task Structure

Each task directory must follow this structure:

```
dataset_path/
├── task_name_1/
│   ├── task.toml          # Task configuration
│   ├── instruction.md     # Task description for agent
│   ├── solution/          # Reference implementation (uploaded after agent runs)
│   └── tests/
│       └── test.sh        # Verification script
├── task_name_2/
│   ├── ...
```

### task.toml Format

```toml theme={null}
[environment]
docker_image = "python:3.11-slim"  # Optional: override default image

# Additional task metadata...
```

### Test Script Requirements

The `tests/test.sh` script must:

1. Execute verification logic
2. Write reward to `/logs/verifier/reward.txt` (single float) or `/logs/verifier/reward.json` (`{"reward": 0.85}`)
3. Exit with status 0 (errors are logged but don't fail scoring)

## Example Usage

```python theme={null}
import verifiers as vf
from pathlib import Path

def load_environment():
    return vf.HarborEnv(
        run_command="python /app/agent.py",
        dataset_path=Path("./harbor_tasks"),
        tasks=["task_1", "task_2"],  # Optional: filter specific tasks
        agent_workdir="/app",
        docker_image="python:3.11",
        max_turns=20,
        timeout_seconds=300,
    )

# Run evaluation
env = load_environment()
results = await env.evaluate(
    client=vf.ClientConfig(api_key="..."),
    model="gpt-4",
    num_examples=10
)

print(f"Average reward: {results['metadata']['avg_reward']}")
```

## Asset Upload Strategy

<Note>
  HarborEnv implements a two-phase upload strategy to prevent test contamination:

  1. **Pre-agent**: Uploads only `instruction.md` and `task.toml`
  2. **Post-agent**: Uploads `solution/` and `tests/` directories before running verification
</Note>

This ensures agents cannot access oracle solutions or test implementations during task execution.

## Environment Variables Available to Agent

```bash theme={null}
OPENAI_BASE_URL=<interception_url>  # For API interception
HARBOR_TASK_NAME=<task_name>
HARBOR_TASK_DIR=/task
HARBOR_INSTRUCTION_PATH=/task/instruction.md
AGENT_WORKDIR=/app
OPENAI_MODEL=<model_name>  # If model is set in state
```

## Custom Agent Setup

```python theme={null}
import verifiers as vf
from pathlib import Path

class CustomHarborEnv(vf.HarborEnv):
    async def post_sandbox_setup(self, state: vf.State) -> None:
        """Upload custom agent code after sandbox creation."""
        await super().post_sandbox_setup(state)  # Upload Harbor assets
        
        sandbox_id = state["sandbox_id"]
        
        # Upload agent code
        await self.sandbox_client.upload_file(
            sandbox_id,
            "/app/agent.py",
            "./my_agent.py"
        )
        
        # Install dependencies
        await self.sandbox_client.execute_command(
            sandbox_id,
            "pip install -r /app/requirements.txt",
            working_dir="/app"
        )

def load_environment():
    return CustomHarborEnv(
        run_command="python /app/agent.py",
        dataset_path=Path("./harbor_tasks"),
    )
```

## Error Handling

Reward computation fails gracefully:

* Test execution errors are logged but return 0.0 reward
* Missing reward files return 0.0
* Invalid JSON/float formats return 0.0
* Infrastructure errors set `state["error"]` and skip scoring

## State Keys

HarborEnv adds the following state keys:

<ParamField path="harbor_config" type="dict">
  Parsed `task.toml` configuration
</ParamField>

<ParamField path="harbor_task_dir" type="str">
  Local path to task directory
</ParamField>

<ParamField path="reward" type="float">
  Computed reward from test execution
</ParamField>

## See Also

* [CliAgentEnv](/api/experimental/cli-agent-env) - Parent class for custom agent environments
* [SandboxEnv](/api/sandbox-env) - Base sandbox management
* Harbor benchmark repository for task format details
