> ## Documentation Index > Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt > Use this file to discover all available pages before exploring further. # GymEnv > Universal runner for Gym-compatible environments # GymEnv A universal adapter for running OpenAI Gym-compatible environments with language models. GymEnv is experimental and subject to breaking changes. The API may change in future releases. ## Overview `GymEnv` bridges the gap between Gym's step-based API and Verifiers' message-based rollout system. It: * Converts Gym observations to text prompts * Parses model completions into actions * Manages episode lifecycle (reset/step/done) * Computes episodic rewards ## Inheritance ``` Environment └── MultiTurnEnv └── GymEnv ``` ## Constructor ```python theme={null} GymEnv( env_cls: type[StepResetEnv], env_kwargs: dict[str, Any] | None = None, action_parser: Callable[[str], Any] | None = None, obs_to_text: Callable[[Any], str] | None = None, num_train_episodes: int = 1000, num_eval_episodes: int = 20, max_episode_steps: int | None = None, seed: int = 0, system_prompt: str | None = None, few_shot: list[dict[str, Any]] | None = None, parser: vf.Parser | None = None, rubric: Rubric | None = None, message_type: MessageType = "chat", ) ``` Gym environment class with `reset(seed)` and `step(action)` methods. Keyword arguments passed to `env_cls()` constructor. Function to parse model output into an action. Defaults to identity (string actions). Function to convert observations to text. Defaults to `str(obs)`. Number of episodes in training dataset. Number of episodes in eval dataset. Maximum steps per episode. If None, uses 1000. Random seed for episode generation. System prompt explaining the task. Custom rubric for scoring. Defaults to `EpisodicSumRubric()`. ## StepResetEnv Protocol Gym environments must implement: ```python theme={null} class StepResetEnv(Protocol): def reset(self, seed: int) -> obs | tuple[obs, dict]: """Reset environment with seed.""" ... def step(self, action) -> tuple[obs, reward, done, info] | tuple[obs, reward, done, truncated, info]: """Take action and return new state.""" ... ``` Supports both old (4-tuple) and new (5-tuple) Gym APIs. ## Example Usage ### CartPole Example ```python theme={null} import verifiers as vf import gymnasium as gym def load_environment(): def action_parser(text: str) -> int: """Parse '0' or '1' from model output.""" text = text.strip().lower() if '0' in text: return 0 elif '1' in text: return 1 else: raise ValueError(f"Invalid action: {text}") def obs_to_text(obs) -> str: """Format CartPole observation.""" cart_pos, cart_vel, pole_angle, pole_vel = obs return f"""CartPole State: - Cart position: {cart_pos:.3f} - Cart velocity: {cart_vel:.3f} - Pole angle: {pole_angle:.3f} - Pole velocity: {pole_vel:.3f} Choose action (0=left, 1=right):""" return vf.GymEnv( env_cls=gym.make, env_kwargs={"id": "CartPole-v1"}, action_parser=action_parser, obs_to_text=obs_to_text, num_train_episodes=100, num_eval_episodes=10, max_episode_steps=500, system_prompt="You are controlling a CartPole. Balance the pole by moving left (0) or right (1).", ) # Run evaluation env = load_environment() results = await env.evaluate( client=vf.ClientConfig(api_key="..."), model="gpt-4", num_examples=10 ) print(f"Average episode reward: {results['metadata']['avg_reward']}") ``` ### Custom Text Game ```python theme={null} import verifiers as vf class TextAdventure: """Simple text-based game.""" def __init__(self): self.location = "start" self.inventory = [] self.steps = 0 def reset(self, seed: int): self.location = "start" self.inventory = [] self.steps = 0 return "You are in a dark room. Exits: north, south", {} def step(self, action: str): self.steps += 1 action = action.lower().strip() if action == "north": self.location = "treasure_room" obs = "You found treasure! You win!" reward = 1.0 done = True elif action == "south": obs = "You fell into a pit. Game over." reward = 0.0 done = True else: obs = f"Invalid action '{action}'. Try: north, south" reward = 0.0 done = False truncated = self.steps >= 10 return obs, reward, done, truncated, {} def load_environment(): return vf.GymEnv( env_cls=TextAdventure, num_train_episodes=50, num_eval_episodes=10, max_episode_steps=10, system_prompt="Navigate the dungeon by typing commands.", ) ``` ### With Custom Parser ```python theme={null} import verifiers as vf import gymnasium as gym import re def load_environment(): def action_parser(text: str) -> int: """Extract numeric action from verbose output.""" # Model might say "I choose action 2" match = re.search(r'\b([0-3])\b', text) if match: return int(match.group(1)) raise ValueError(f"No valid action in: {text}") return vf.GymEnv( env_cls=gym.make, env_kwargs={"id": "LunarLander-v2"}, action_parser=action_parser, num_train_episodes=100, max_episode_steps=1000, ) ``` ## Built-in Rubric ### EpisodicSumRubric Default rubric that sums step rewards: ```python theme={null} class EpisodicSumRubric(Rubric): def __init__(self, weight: float = 1.0, **kwargs): super().__init__(funcs=[sum_step_rewards], weights=[weight], **kwargs) ``` Accesses `state["trajectory"]` to sum per-step rewards from the environment. ## Key Methods ### gym\_to\_hf ```python theme={null} def gym_to_hf(self) -> tuple[Dataset, Dataset | None] ``` Generates HuggingFace datasets by running `reset()` on each episode: ```python theme={null} # Each row: { "question": obs_to_text(initial_obs), "answer": str(seed), # Stored for reproducibility } ``` ### obs\_to\_text ```python theme={null} def obs_to_text(self, obs: Any) -> str ``` Converts observation to text. Override for custom formatting: ```python theme={null} class CustomGymEnv(vf.GymEnv): def obs_to_text(self, obs): # Custom observation rendering return f"Custom format: {obs}" ``` ### env\_response ```python theme={null} async def env_response( self, messages: vf.Messages, state: State, **kwargs ) -> vf.Messages | str ``` Executes `env.step(action)` and returns observation as user message. ## State Keys GymEnv adds: Active Gym environment instance (created per rollout). Whether episode has terminated. Step-level reward from `env.step()`. Info dict returned by `env.step()`. ## Error Handling Action parsing errors: * Set `state["gym_done"] = True` * Return error message to model * Assign 0.0 reward to that step ```python theme={null} # If action_parser raises: "Action Parsing Error: Invalid action: 'foo'" ``` ## Stop Conditions Episode ends when: 1. Gym returns `done=True` or `truncated=True` 2. `max_episode_steps` is reached 3. Action parsing fails ## Advanced: Custom Reward ```python theme={null} import verifiers as vf import gymnasium as gym def custom_reward(state: vf.State) -> float: """Bonus for efficiency.""" total_reward = sum( step.get("reward", 0.0) for step in state["trajectory"] ) num_steps = len(state["trajectory"]) efficiency_bonus = max(0, 1.0 - num_steps / 100) return total_reward + efficiency_bonus def load_environment(): return vf.GymEnv( env_cls=gym.make, env_kwargs={"id": "CartPole-v1"}, rubric=vf.Rubric(custom_reward), ) ``` ## Limitations * **Text-only**: Observation must be convertible to text (no vision support) * **Synchronous**: Gym envs are not async * **Single episode per rollout**: Each rollout is one episode ## When to Use Use GymEnv for: * Existing Gym environments * Sequential decision-making tasks * Reinforcement learning benchmarks * Text-based games For tool-based tasks, use [ToolEnv](/api/tool-env) instead. ## See Also * [MultiTurnEnv](/api/multi-turn-env) - Parent class * [Rubric](/api/rubric) - Custom reward functions * [Gymnasium Documentation](https://gymnasium.farama.org/) - Gym API reference