> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt
> Use this file to discover all available pages before exploring further.

# Hosted Training

> Train models on Prime Intellect's managed infrastructure

Hosted Training, available within the [Prime Intellect Lab platform](https://app.primeintellect.ai/dashboard/training), enables you to automatically train models via `prime-rl` without needing to manage your own infrastructure.

Hosted Training supports LoRA for RL training and can be used with any environment built with Verifiers.

## Features

* **Zero infrastructure management** - No need to provision GPUs or manage servers
* **Automatic scaling** - Training infrastructure scales based on your config
* **LoRA training** - Efficient parameter-efficient fine-tuning
* **Any Verifiers environment** - Train on Hub environments or your own
* **Weights & Biases integration** - Automatic logging and experiment tracking

## Getting Started

<Info>
  Hosted Training is currently in Private Beta. For access, please fill out [this form](https://form.typeform.com/to/iYn9UliG).
</Info>

<Steps>
  <Step title="Set up your workspace">
    Download example configuration files:

    ```bash theme={null}
    prime lab setup
    ```

    This creates:

    ```
    configs/
    ├── endpoints.toml      # API endpoint configuration
    ├── rl/                 # Hosted Training configs
    │   ├── alphabet-sort.toml
    │   ├── gsm8k.toml
    │   ├── math-python.toml
    │   ├── reverse-text.toml
    │   ├── wiki-search.toml
    │   └── wordle.toml
    ├── eval/               # Evaluation configs
    └── gepa/               # Prompt optimization configs
    ```
  </Step>

  <Step title="Configure your training">
    Edit one of the example configs or create your own. Example for `alphabet-sort`:

    ```toml theme={null}
    model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
    max_steps = 500
    batch_size = 256
    rollouts_per_example = 8

    [sampling]
    max_tokens = 512

    [[env]]
    id = "primeintellect/alphabet-sort"
    args = { min_turns = 3, max_turns = 5, power_per_turn = false }

    [wandb]
    project = "alphabet-sort"
    name = "qwen3-30b-i-alphabet-sort"
    ```
  </Step>

  <Step title="Submit your training job">
    Submit to Hosted Training via the Prime CLI:

    ```bash theme={null}
    prime train submit configs/rl/alphabet-sort.toml
    ```

    Or use the web interface at [app.primeintellect.ai/dashboard/training](https://app.primeintellect.ai/dashboard/training)
  </Step>

  <Step title="Monitor your training">
    View training progress:

    * In the Prime Intellect dashboard
    * In Weights & Biases (if configured)
    * Via the CLI: `prime train status <job-id>`
  </Step>
</Steps>

## Supported Models

We currently support the following models for Hosted Training:

* `Qwen/Qwen3-4B-Instruct-2507`
* `Qwen/Qwen3-4B-Thinking-2507`
* `Qwen/Qwen3-30B-Instruct-2507`
* `Qwen/Qwen3-30B-Thinking-2507`
* `Qwen/Qwen3-235B-Instruct-2507`
* `Qwen/Qwen3-235B-Thinking-2507`
* `PrimeIntellect/INTELLECT-3`

<Note>
  Additional models can be supported upon request. Contact support if you need a specific model.
</Note>

## Configuration Reference

### Basic Configuration

```toml theme={null}
model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 500
batch_size = 256
rollouts_per_example = 8
learning_rate = 1e-5
```

### Environment Configuration

Train on environments from the Environments Hub:

```toml theme={null}
[[env]]
id = "primeintellect/math-python"
args = { max_turns = 10, difficulty = "hard" }
```

Or train on your own local environment:

```toml theme={null}
[[env]]
id = "my-custom-env"  # from ./environments/my_custom_env
args = { num_examples = 1000 }
```

Multiple environments:

```toml theme={null}
[[env]]
id = "primeintellect/math-python"
weight = 0.6

[[env]]
id = "primeintellect/gsm8k"
weight = 0.4
```

### Sampling Configuration

```toml theme={null}
[sampling]
max_tokens = 512
temperature = 0.7
top_p = 0.9
stop = ["<|endoftext|>", "</s>"]
```

### LoRA Configuration

LoRA is enabled by default for Hosted Training:

```toml theme={null}
[lora]
enabled = true
r = 64
alpha = 16
dropout = 0.05
target_modules = ["q_proj", "v_proj", "k_proj", "o_proj"]
```

### Weights & Biases Integration

```toml theme={null}
[wandb]
project = "my-project"
name = "my-training-run"
entity = "my-team"  # optional
```

Set your W\&B API key in the Prime Intellect dashboard under Settings > Environment Variables.

### Environment Variables

If your environment requires API keys or secrets, configure them via:

1. **Dashboard**: Settings > Environment Variables
2. **Config file**:

```toml theme={null}
env_file = ["secrets.env"]
```

Then create `secrets.env`:

```bash theme={null}
OPENAI_API_KEY=sk-...
BROWSERBASE_API_KEY=...
```

<Note>
  Environment variables set in the dashboard take precedence over those in `env_file`.
</Note>

## Training Examples

### GSM8K Math Training

```toml theme={null}
model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 1000
batch_size = 256
rollouts_per_example = 8

[sampling]
max_tokens = 1024
temperature = 0.7

[[env]]
id = "primeintellect/gsm8k"
args = { max_turns = 1 }

[wandb]
project = "gsm8k-training"
```

### Multi-Turn Wiki Search

```toml theme={null}
model = "Qwen/Qwen3-30B-Instruct-2507"
max_steps = 500
batch_size = 128
rollouts_per_example = 8

[sampling]
max_tokens = 2048

[[env]]
id = "primeintellect/wiki-search"
args = { max_turns = 10, num_questions = 5 }

[wandb]
project = "wiki-search"
name = "qwen3-30b-wiki"
```

### Multi-Environment Training

```toml theme={null}
model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 2000
batch_size = 256

[[env]]
id = "primeintellect/math-python"
weight = 0.4

[[env]]
id = "primeintellect/gsm8k"
weight = 0.3

[[env]]
id = "primeintellect/wiki-search"
weight = 0.3

[wandb]
project = "multi-task-training"
```

## Downloading Checkpoints

After training completes, download your trained model:

```bash theme={null}
prime train download <job-id> --output ./checkpoints/my-model
```

This downloads the final checkpoint and LoRA adapter (if applicable).

## Best Practices

<Note>
  Before submitting a training job, validate your environment locally:

  ```bash theme={null}
  prime eval run my-env -m openai/gpt-4.1-mini -n 10
  ```

  Ensure baseline reward is between 5% and 80%.
</Note>

### Hyperparameter Guidelines

**For faster training:**

* Use smaller models (4B-30B)
* Increase learning rate (1e-5 to 1e-4)
* Decrease `rollouts_per_example` (4-8)

**For more stable training:**

* Use larger models (30B+)
* Increase `rollouts_per_example` (16-32)
* Increase `batch_size` (512+)

### Cost Optimization

* Use LoRA instead of full finetuning
* Start with smaller models and scale up if needed
* Use `max_steps` to limit training duration
* Monitor W\&B to stop training when performance plateaus

## Troubleshooting

### Training Not Starting

* Check that your config is valid TOML
* Ensure your environment is published to the Environments Hub (if using a Hub environment)
* Verify all required API keys are set

### Training Failed

* Check job logs: `prime train logs <job-id>`
* Common issues:
  * Missing environment dependencies
  * Invalid environment arguments
  * Missing API keys for environment

### Poor Training Performance

* Task may be too hard for the model (baseline reward \< 5%)
* Task may be too easy (baseline reward > 80%)
* Learning rate may be too high (causing instability)
* Try enabling online difficulty filtering in advanced settings

## Support

For help with Hosted Training:

* Email: [support@primeintellect.ai](mailto:support@primeintellect.ai)
* Discord: [discord.gg/primeintellect](https://discord.gg/primeintellect)
* Documentation: [docs.primeintellect.ai](https://docs.primeintellect.ai)
